Posts for the month of March 2011

iTunes Store parasites

So I was looking whether the name Monsterz was already taken on the iTunes app store. Unfortunately, it was: synx.us’s Monsterz has been there for a while. It’s a rather poor game where “the aim of the game is to push as many tiles off the screen as possible within the time limit”, but well, that’s the rule: first-come, first-served.

However, I had a look at another linked application by synx.us, and discovered Baseballz, which looked quite similar. Of course it looked similar: in Baseballz, “the aim of the game is to push as many tiles off the screen as possible within the time limit”. Oh well, why not? Who would call a game Baseballz anyway?

Then I found Horrorz, where “the aim of the game is to push as many tiles off the screen as possible within the time limit”. In Soccerz, “the aim of the game is to push as many tiles off the screen as possible within the time limit”. Then came Flagz, Golfz, Razing and Eazter. Eazter? Seriously, what the fuck? And there is more! Tenniz, Valentines, St Patrick’s Day, Blox, Jox, Bikez and fucking Footballz. All these games have the exact same gameplay and even the same title screen text.

Congratulations, Apple. You won’t allow an Obama trampoline jumping app yet fifteen times the same fucking application from obvious namespace parasites is OK.

Understanding fast float/integer conversions

If you are interested in micro-optimisation and have never studied IEEE-754 float representation, I suggest you have a look at the beast. It may give you ideas for interesting bitlevel operations. This article will cover the specific topic of conversions between integers and floats.

Note: unless you are coding for antique or very specific architectures such as the PDP-11, you may assume that the floating point storage endianness and the integer endianness match. The code presented here will therefore work flawlessly on modern CPU architectures such as x86, amd64, PowerPC or even ARM.

Introduction

Here are a few floating point values and their bitlevel representation. Notice how the different values affect the sign, exponent and mantissa fields:

sign exponent mantissa
1.0f 0 01111111 0000000 00000000 00000000
-1.0f 1 01111111 0000000 00000000 00000000
0.5f 0 01111110 0000000 00000000 00000000
0.25f 0 01111101 0000000 00000000 00000000
1.0f + 0.5f 0 01111111 1000000 00000000 00000000
1.0f + 0.25f 0 01111111 0100000 00000000 00000000

The core idea behind this article is the manipulation of the last field, the mantissa.

Byte to float conversion

A classical byte (0 - 255) to float (0.0f - 1.0f) conversion function is shown here:

float u8tofloat(uint8_t x)
{
    return (float)x * (1.0f / 255.0f);
}

This looks very simple: one conversion (fild on x86) and one multiplication (fmul on x86). However, the fild instruction has a latency such that the conversion may have a severe impact on performance.

But let’s look at these interesting floating point values:

sign exponent mantissa
32768.0f 0 10001110 0000000 00000000 00000000
32768.0f + 1.0f/256.0f 0 10001110 0000000 00000000 00000001
32768.0f + 2.0f/256.0f 0 10001110 0000000 00000000 00000010
32768.0f + 255.0f/256.0f 0 10001110 0000000 00000000 11111111

Notice the last eight bits? They look almost exactly like the input byte expected by u8tofloat. Taking advantage of the binary representation allows us to write the following conversion function:

float u8tofloat_trick(uint8_t x)
{
    union { float f; uint32_t i; } u; u.f = 32768.0f; u.i |= x;
    return u.f - 32768.0f;
}

When used in a CPU-intensive loop, this method can be up to twice as fast as the previous implementation, for instance on the amd64 architecture. On the x86 architecture, the difference is far less noticeable.

You probably noticed that the output range is 0.0f - 255.0f/256.0f instead of 0.0f - 1.0f. This may be preferred in some cases when the value is supposed to wrap around. However, colour coordinates will require exact 0.0f - 1.0f bounds. This is easily fixed with an additional multiplication:

float u8tofloat_trick2(uint8_t x)
{
    union { float f; uint32_t i; } u; u.f = 32768.0f; u.i |= x;
    return (u.f - 32768.0f) * (256.0f / 255.0f);
}

This can still be up to twice as fast than the original integer to float cast.

Short to float conversion

The usual way to convert a 16-bit integer to a float will be:

float u16tofloat(uint16_t x)
{
    return (float)x * (1.0f / 65535.0f);
}

Again, careful observation of the following floats will be useful:

sign exponent mantissa
16777216.0f 0 10010111 0000000 00000000 00000000
16777216.0f + 1.0f/65536.0f 0 10010111 0000000 00000000 00000001
16777216.0f + 2.0f/65536.0f 0 10010111 0000000 00000000 00000010
16777216.0f + 65535.0f/65536.0f 0 10010111 0000000 11111111 11111111

And the resulting conversion method:

float u16tofloat_trick(uint16_t x)
{
    union { float f; uint32_t i; } u; u.f = 16777216.0f; u.i |= x;
    return u.f - 16777216.0f; // optionally: (u.f - 16777216.0f) * (65536.0f / 65535.0f)
}

However, due to the size of the input data, the performance gain here can be much less visible. Be sure to properly benchmark.

Int to float conversion

The above techniques cannot be directly applied to 32-bit integers because floats only have a 23-bit mantissa. Several methods are possible:

  • Use the double type instead of float. They have a 52-bit mantissa.
  • Reduce the input int precision to 23 bits.

Float to int conversion

Finally, the exact same technique can be used for the inverse conversion. This is the naive implementation:

static inline uint8_t u8fromfloat(float x)
{
    return (int)(x * 255.0f);
}

Clamping is left as an exercise to the reader. Also note that a value such as 255.99999f will ensure better distribution and avoid singling out the 1.0f value.

And our now familiar bitlevel trick:

static inline uint8_t u8fromfloat_trick(float x)
{
    union { float f; uint32_t i; } u;
    u.f = 32768.0f + x * (255.0f / 256.0f);
    return (uint8_t)u.i;
}

Unfortunately, this is usually a performance hit on amd64. However, on x86, it is up to three time as fast as the original. Choose wisely!

The LUT strategy

Some will point out that using a lookup table is much faster.

float lut[256];
void fill_lut()
{
    for (int n = 0; n < 256; n++) lut[n] = (float)n / 255.0f;
}
float u8tofloat_lut(uint8_t x)
{
    return lut[x];
}

This is indeed faster in many cases and should not be overlooked. However, the following should be taken into account:

  • LUTs are fast, but if unlucky, the cache may get in your way and cause performance issues
  • the LUT approach is actually almost always slower with 16-bit input, because the size of the table starts messing with the cache
  • do not underestimate the time needed to fill the LUT, especially if different conversions need to be performed, requiring several LUTs
  • LUTs do not mix well with SIMD instructions
  • obviously, this method doesn’t work with float to int conversions

Last warnings

Many programmers will be tempted to write shorter code such as:

float u8tofloat_INVALID(uint8_t x)
{
    // NO! DON’T DO THIS! EVER!
    float f = 32768.0f; *(uint32_t *)&f |= x;
    return f - 32768.0f;
}

Do not do this, ever! I guarantee that this will break in very nasty and unexpected places. Strict C and C++ aliasing rules make it illegal to have a pointer to a float also be a pointer to an integer. The only legal way to do this is to use a union (actually, this is still not legal by the C++ standard but most real-life compilers allow this type punning through documented extensions).

Finally, one last, obvious tip: always measure the effects of an optimisation before deciding to use it!

Fuck you, Microsoft: the environment variable windows

Why, oh why, is the environment variable window (accessible from the system preferences) such an atrocious experience, beyond the limits of human pain tolerance?

Why does the window not have a fucking resize handle? Why do I have to click on scrollbar handles smaller than my fucking mouse pointer to browse my environment variables? Why is there no way to search or replace strings? Why is the intern who wrote this GUI probably a top executive now?

Build and run Android NDK applications without Eclipse

If you already have a development environment and do not wish to use Eclipse, you can easily build and run your NDK application from makefiles or the command line.

First of all, you need to set the ANDROID_NDK_ROOT environment variable and ensure the SDK and NDK binary directories are in PATH. Here are my definitions:

ANDROID_NDK_ROOT=/home/sam/android/android-ndk-r8
PATH="$PATH:$ANDROID_NDK_ROOT"
PATH="$PATH:/home/sam/android/android-sdk-linux_x86/platform-tools"
PATH="$PATH:/home/sam/android/android-sdk-linux_x86/tools"

This is best defined in one of your shell’s startup scripts such as .zshenv.

Build and install package

Now, whenever you are in an NDK project’s directory, build the project using:

ndk-build && ant release

And to upload it to the emulator or to a connected device:

ant release install

That’s all! Those two simple commands can easily be launched from your preferred development environment.

Update: ant compile no longer exists in recent SDKs; replaced with ant release.

Run package

You can use adb to run any application remotely. For instance:

adb shell am start -a android.intent.action.MAIN -n $PACKAGENAME/.$ACTIVITYNAME

Both package name and activity name can be found in your AndroidManifest.xml.

Fuck you, Microsoft: near and far macros

If you target the Windows platform, chances are that your code will have this:

#include <windows.h>

Which in turns includes <windef.h>, which unconditionally defines the following macros:

#define far
#define near

Right. Because there’s no chance in hell that, writing 3D code for Windows, someone’s gonna name any of their variables near or far. Never happens. Never will.

Fuck you, Microsoft, for not even providing a way to disable that monstrosity with a global macro such as NOFUCKINGMACROSFROMTHEEIGHTIES but instead requiring me to #undef those macros after each inclusion of <windows.h>. And it’s not like you don’t know how to do that, because you provide NOMINMAX which deactivates your min() and max() macros in the same fucking file. Fuck you for silently breaking code that compiles cleanly on every platform, Mac OS X, Android or the Playstation.

I refuse to be swayed by your terror tactics and name my variables m_fNearPlaneClipDistance or whatever deranged mind decides is better. My near and far values are called near and far, because I love this naming scheme, and if you don’t, fuck you and your fat wife.

Load PNGs from assets using Android NDK

Many developers appear to embed libpng with their NDK project in order to decode PNGs. While libpng does offer great flexibility, the amount of code necessary to decode an image is surprisingly high, and the additional work needed to maintain a libpng build means that most of the time, using the system’s decoding routines is perfectly reasonable.

But wait, isn’t the NDK for C++ development only? True, but usually we are still running in a virtual machine that has access to a large panel of high-level utility libraries. This article actually demonstrates a broader, useful technique I call return-to-JVM that you can use for other purposes than simply PNG loading.

I suggest putting your PNG files in the assets directory of your application, so that they can be accessed by path.

First, let’s decide of a Java class and object that will act as a PNG factory and manager for us. Let’s call it PngManager:

import android.content.res.AssetManager;

public class PngManager
{
    private AssetManager amgr;

    public Bitmap open(String path)
    {
        try
        {
            return BitmapFactory.decodeStream(amgr.open(path));
        }
        catch (Exception e) { }
        return null;
    }

    public int getWidth(Bitmap bmp) { return bmp.getWidth(); }
    public int getHeight(Bitmap bmp) { return bmp.getHeight(); }

    public void getPixels(Bitmap bmp, int[] pixels)
    {
        int w = bmp.getWidth();
        int h = bmp.getHeight();
        bmp.getPixels(pixels, 0, w, 0, 0, w, h);
    }

    public void close(Bitmap bmp)
    {
        bmp.recycle();
    }
}

Now to load the PNG from the C++ part of the program, use the following code:

jobject g_pngmgr;
JNIEnv *g_env;

/* ... */

char const *path = "images/myimage.png";

jclass cls = g_env->GetObjectClass(g_pngmgr);
jmethodID mid;

/* Ask the PNG manager for a bitmap */
mid = g_env->GetMethodID(cls, "open",
                         "(Ljava/lang/String;)Landroid/graphics/Bitmap;");
jstring name = g_env->NewStringUTF(path);
jobject png = g_env->CallObjectMethod(g_pngmgr, mid, name);
g_env->DeleteLocalRef(name);
g_env->NewGlobalRef(png);

/* Get image dimensions */
mid = g_env->GetMethodID(cls, "getWidth", "(Landroid/graphics/Bitmap;)I");
int width = g_env->CallIntMethod(g_pngmgr, mid, png);
mid = g_env->GetMethodID(cls, "getHeight", "(Landroid/graphics/Bitmap;)I");
int height = g_env->CallIntMethod(g_pngmgr, mid, png);

/* Get pixels */
jintArray array = g_env->NewIntArray(width * height);
g_env->NewGlobalRef(array);
mid = g_env->GetMethodID(cls, "getPixels", "(Landroid/graphics/Bitmap;[I)V");
g_env->CallVoidMethod(g_pngmgr, mid, png, array);

jint *pixels = g_env->GetIntArrayElements(array, 0);

Now do anything you want with the pixels, for instance bind them to a texture.

And to release the bitmap when finished:

g_env->ReleaseIntArrayElements(array, pixels, 0);
g_env->DeleteGlobalRef(array);

/* Free image */
mid = g_env->GetMethodID(cls, "close", "(Landroid/graphics/Bitmap;)V");
g_env->CallVoidMethod(g_pngmgr, mid, png);
g_env->DeleteGlobalRef(png);

This will not work out of the box. There are a few last things to do, which will hugely depend on your global application architecture and are thus left as an exercise to the reader:

  • Store an AssetManager object in PngManager::amgr before the first call to open() is made (for instance by calling Activity::getAssets() upon application initialisation).
  • Store in g_env a valid JNIEnv * value (the JNI environment is the first argument to all JNI methods), either by remembering it or by using jvm->AttachCurrentThread().
  • Store in g_pngmgr a valid jobject handle to a PngManager instance (for instance by calling a JNI method with the instance as an argument).
  • Error checking was totally omitted from the code for the sake of clarity.
  • Some of the dynamically retrieved variables could benefit from being cached.

I hope this can prove helpful!

For a C++-only solution to this problem, see Load pngs from assets in NDK by Bill Hsu.

The Lol Engine Blog

This blog will contain various articles about the development of the Lol Engine, game development and development in general.

Latest posts

  • Posted: 2011-03-02 02:01 (Updated: 2011-03-22 14:35)
  • Author: sam
  • Categories: blog
  • Comments (47)