Posts in category tip

# Damping with delta-time

Just a quick tip on how to convert usual damping code to something framerate-independent.

Most of us have probably, at some point, written code resembling this:

// Perform velocity damping
velocity -= velocity * 0.01f;


… or probably the more correct:

// Per-second damping coefficient
float const D = 10.0f;

// Damp velocity according to timestep
velocity -= velocity * D * delta_time;


Yet this is not fully framerate-independent; results are slightly different at 30fps and 60fps, and more importantly, spikes in the framerate cause lots of weird artifacts, causing developers to attempt to fix the situation by clamping delta_time, which is not ideal.

## The exponentiation method

Here is one way to fix it: assume that the code works correctly at 60 fps. This means that each frame, velocity is effectively multiplied by 1 - D / 60.

After one second, i.e. 60 frames, velocity has been multiplied by (1 - D / 60) ^ 60.

After two seconds, it has been multiplied by (1 - D / 60) ^ (60 * 2).

After N seconds, it has been multiplied by (1 - D / 60) ^ (60 * N).

So, there, we have a formula that tells us what happens after N seconds, and it’s a continuous function. We can therefore choose N as we like, and especially N = delta_time:

// Per-second damping coefficient
float const D = 10.0f;

// Damp velocity (framerate-independent)
velocity *= pow(1.f - D / 60.f, 60.f * delta_time);


Which can be conveniently rewritten as:

// Per-second damping coefficient
float const D = 10.0f;
// Exponentiation base for velocity damping
float const D2 = pow(1.f - D / 60.f, 60.f);

// Damp velocity (framerate-independent)
velocity *= pow(D2, delta_time);


## Use with lerp

The same method can be adapted to uses of linear interpolation such as this one:

// Perform velocity damping
velocity = lerp(velocity, target_velocity, K * delta_time);


Which we replace with:

// Damp velocity (framerate-independent)
velocity = lerp(velocity, target_velocity,
1.f - pow(1.f - K / 60.f, 60.f * delta_time));


# The stolen bytes: Visual Studio, virtual methods and data alignment

This article describes a design choice in the C++ ABI of the Visual Studio compiler that I believe should be considered a bug. I propose a trivial workaround at the end.

TL;DR — if the topmost polymorphic class in a hierarchy has members with alignment requirement N where N > sizeof(void *), the Visual Studio compiler may add up to N bytes of useless padding to your objects.

Update: be sure to read the explanation by Jan Gray, who designed the relevant part of the MS C++ ABI some 22 years ago, in the comments section below.

My colleague Benlitz first hit the problem when trying to squeeze memory out of some of our game’s most often instantiated classes. I think it is best illustrated with the following minimal example:

class Foo
{
virtual void Hello() {}

float f;     /* 4 bytes */
};

class Bar
{
virtual void Hello() {}

float f;     /* 4 bytes */
double d;    /* 8 bytes */
};


This is the size of Foo and Bar on various 32-bit platforms:

Platform sizeof(Foo) sizeof(Bar) Madness?
Linux x86 (gcc) 8 16 no
Linux ARMv9 (gcc) 8 16 no
Win32 (gcc) 8 16 no
Win32 (Visual Studio 2010) 8 24 yes
Xbox 360 (Visual Studio 2010) 8 24 yes
PlayStation 3 (gcc) 8 16 no
PlayStation 3 (SNC) 8 16 no
Mac OS X x86 (gcc) 8 16 no

There is no trick. This is by design. The Visual Studio compiler is literally stealing 8 bytes from us!

## What the fuck is happening?

This is the memory layout of Foo on all observed platforms:

The vfptr field is a special pointer to the vtable. The vtable is probably the most widespread compiler-specific way to implement virtual methods. Since all the platforms studied here are 32-bit, this pointer requires 4 bytes. A float requires 4 bytes, too. The total size of the class is therefore 8 bytes.

This is the memory layout of Bar on eg. Linux using GCC:

The double type has an alignment requirement of 8 bytes, which makes it fit perfectly at byte offset 8.

And finally, this is the memory layout of Bar on Win32 using Visual Studio 2010:

This is madness! The requirement for the class to be 8-byte aligned causes the first element of the class to be 8-byte aligned, too! I demand a rational explanation for this design choice.

The problem is that the compiler decides to add the vtable pointer after it has aligned the class data, resulting in excessive realignment.

## Compilers affected

The Visual Studio compilers for Win32, x64 and Xbox 360 all appear to create spurious padding in classes.

Though this article focuses on 32-bit platforms for the sake of simplicity, 64-bit Windows is affected, too.

The problem becomes even worse with larger alignment requirements, for instance with SSE3 or AltiVec types that require 16-byte storage alignment such as _FP128:

class Quux
{
virtual void Hello() {}

float f;     /* 4 bytes */
_FP128 dd;   /* 16 bytes */
};


This is the GCC memory layout on both 32-bit and 64-bit platforms:

The padding there is perfectly normal and expected, because of the alignment requirements for dd.

But this is how Visual Studio decides to lay it out:

That is 16 lost bytes, both on 32-bit and 64-bit versions of Windows.

## Workaround

There is fortunately a workaround if you want to get rid of the useless padding. It is so trivial that it actually makes me angry that the problem exists in the first place.

This will get you your bytes back:

class EmptyBase
{
protected:
virtual ~EmptyBase() {}
};

class Bar : public EmptyBase
{
virtual void Hello() {}

float f;     /* 4 bytes */
double d;    /* 8 bytes */
};


And this is the size of Bar on the same 32-bit platforms:

Platform sizeof(Bar)
Linux x86 (gcc) 16
Linux ARMv9 (gcc) 16
Win32 (gcc) 16
Win32 (Visual Studio 2010) 16
Xbox 360 (Visual Studio 2010) 16
PlayStation 3 (gcc) 16
PlayStation 3 (SNC) 16
Mac OS X x86 (gcc) 16

Phew. Sanity restored.

The compiler is a lot less confused now: it no longer has to create space for a vfptr in Bar since it is technically already part of EmptyBase.

## Conclusion

Lessons learned:

• The pointer to the vtable isn’t just like any other pointer.
• Various C++ ABIs have different stances on padding and alignment.
• Inheriting from an empty abstract class can make your objects smaller on Windows and Xbox 360!
• Design decisions can haunt you for decades!

The workaround is so simple that it sounds like a good idea to always use it, preemptively.

# Setting up a real Compose key on Mac OS X

The Compose key is my method of choice for international character input. It lets me type characters as diverse as é Â ẃ ṗ § … « ¿ ¥ ¹ ½ © × using simple and intuitive key combinations.

How intuitive exactly? Let’s see:

• Compose + C + , gives me Ç.
• Compose + l + / gives me ł.
• Compose + < + < gives me «.
• Compose + - + > gives me →.
• Compose + < + 3 gives me ♥.
• Compose + C + C + C + P gives me ☭ (no kidding).

You can of course set up your own rules. This shit is so powerful that I cannot imagine I could ever use any other input method.

So, one would think that with all its glorious Unix heritage, Mac OS X would let you get the most out of your keyboard like the good old X11 system does. Well it turns out it’s possible, but not straightforward.

Fortunately, other people already did all the work. I will just indicate how to put their stuff together.

## Step 1: choose a Compose key

Choose the Compose key so that it is easily accessible but does not prevent you from doing anything you ordinarily do. Fortunately, modern keyboards come with more and more idiotic and useless keys.

I use the Right Alt key as my Compose key. I already have a Left Alt key so the right one is a bit useless to me. And it somehow matches the position of the Compose key on old Sun keyboards.

That would be Right Option on a Mac keyboard. I recommend that.

## Step 2: remap the Compose key

The problem is that the Mac OS X keyboard preferences:

• do not let you differentiate between Left and Right Option keys
• only let you remap modifier keys to another modifier key (or to nothing)

Fortunately, there is KeyRemap4MacBook that lets you do very low level things with your keyboard. Install it.

We will now remap our compose key to something that the next layer will understand. I chose Shift-Control-F13 for that. It is very unlikely you will need that key combination.

In the file ~/Library/Application Support/KeyRemap4MacBook/private.xml put the following:

<?xml version="1.0"?>
<root>
<item>
<name>Send Shift-Ctrl-F13 for Right Option</name>
<identifier>private.send_shift_ctrl_f13_for_ropt</identifier>
<autogen>__KeyToKey__ KeyCode::OPTION_R,
KeyCode::F13, ModifierFlag::SHIFT_L
| ModifierFlag::CONTROL_L
</autogen>
</item>
</root>


Finally, from the System Preferences, open the KeyRemap4MacBook settings and click on the ReloadXML button:

The new option should appear. Activate it:

## Step 3: create compose bindings

The last step is the creation of the actual bindings. I chose to import the rules from /usr/share/X11/locale/en_US.UTF-8/Compose on my Debian system.

Bob Kåres wrote a script that lets you convert X11 compose rules into Cocoa key bindings.

Either convert a Compose file of your own using Bob’s script, or download my DefaultKeyBinding.dict. Save it in ~/Library/KeyBindings/DefaultKeyBinding.dict.

Be careful: by default Bob’s script uses F13 instead of Shift-Ctrl-F13 so in DefaultKeyBinding.dict you need to change:

    "\UF710"


into:

    "^\UF710"  If for some reason you decided to go for another combination, check out this article by Xah Lee to find out the proper syntax. ## Step 4: restart all applications And that’s it! Your Mac OS X system is now slightly more usable. • Posted: 2012-06-18 08:18 (Updated: 2015-09-05 19:38) • Author: sam • Categories: a11y osx tip • Comments (2441) # LINK : fatal error LNK1104: cannot open file 'XAPID.lib' Ever got a link error for a library that was referenced nowhere in your Visual Studio project or even in the final link.exe command line? Here's a hint: check the contents of static libraries, too. They may be pulling unexpected dependencies behind your back! If the static library is part of your solution, here is another hint: check that the [Configuration Properties] >> [C/C++] >> [Code Generation] >> [Runtime Library] configuration values match across projects. # Maths trick: doing fewer comparisons Note: this is not an optimisation. It is just one more tool you should have in your toolbox when looking for optimisations. It may be useful. This is the trick: You can check for yourself that it is always true: when x > y, |x - y| is the same as x - y, etc. What good is it for? There is often an implicit comparison in min or max. It might be interesting to replace it with a call to the branchless fabs. ## Example usage Consider the following code: float a, b, c, d; /* ... */ return (a > b) && (c > d);  That kind of code is often used eg. in collision checks, where a lot of tests can be done. This code does two comparisons. On some architectures, this means two branches. Not always something you want. The test condition is equivalent to: (a - b > 0) && (c - d > 0)  Now when are two given numbers both positive? That is if and only if the smallest is positive: min(a - b, c - d) > 0  We may now use our trick: (a - b) + (c - d) - |(a - b) - (c + d)| > 0  And so the code could be rewritten as such: float a, b, c, d; /* ... */ return (a - b) + (c - d) > fabsf((a - b) - (c - d));  We basically replaced the additional test with a call to fabsf and some additions/subtractions. It may be possible to reorganise the input data so that this second version performs better. # C++ trick: selectively restrict implicit conversions TL;DR: given a class Foo with an implicit constructor from int, how to allow the implicit conversion in f(42); but not in g(42); where both f and g take a Foo const & argument? ## Background So I have this real class that performs numeric operations that I want use just like any other C++ numeric type. For instance, I can write the following: float f = 15, g = 3.5; int x = f / g;  If I decide that I need double precision, I can write: double f = 15, g = 3.5; int x = f / g;  And of course, using my real class for even higher precision: real f = 15, g = 3.5; int x = f / g;  I like that. I can just write code as usual, and when I need higher precision, I use real instead of double. It's transparent and convenient. ## Implementation example Here is a highly simplified example of a real class: struct real { inline real(double d) : m_value(d) {} inline operator int() const { return (int)m_value; } /* ... */ long double m_value; };  It is possible to write real f = 15 because of the implicit constructor. Actually, C++ constructors are always implicit unless specified otherwise. It is possible to write int x = f / g because of the conversion operator. So far, so good. ## The problem with implicit promotion Here is how fabs could be implemented: real fabs(real const &r) { return real(r.m_value < 0 ? -r.m_value : r.m_value); }  But now we have a problem. A subtle problem. Consider the following code: double x = fabs(-5.0);  What does this do? Well, it depends. It depends whether <cmath> was included or not. Because if <cmath> wasn’t included, then that code is going to automatically promote -5.0 to a real and call our custom function instead of the one provided by the math library! With no compile-time warning! This is confusing. It should not happen. But it is a well known problem and there are several obvious workarounds: 1. What most professional C++ programmers will tell you: use namespaces 2. Mark the real(int) constructor explicit The problem with 1. is that I am not a professional C++ programmer. I am a C programmer who uses C++. I use preprocessor macros and printf and memalign and goto. Try and stop me! The problem with 2. is that I can no longer write real f = 15, I would need real f(15) or real f = real(15) instead. This is not acceptable, I want real to behave exactly like float and others, to the fullest extent of what the language allows. ## Another solution Fortunately, the C++ standard has a solution for us: “Implicit conversions will be performed [...] if the parameter type contains no template-parameters that participate in template argument deduction” (ISO/IEC 14882:1998, section 14.8.1.4). You cannot have implicit conversion and template argument deduction at the same time. It means we just have to make fabs a template function! Which means making real a template class, too. A quick way to fix real would be: /* N is unused */ template<int N> struct real_base { inline real_base(double d) : m_value(d) {} inline operator int() const { return (int)m_value; } /* ... */ long double m_value; }; typedef real_base<0> real;  The template argument is useless, unfortunately. It will just have to be here, forever. But who knows, you might find a use for it one day. And to fix fabs: /* A generic template declaration is needed */ template<int N> real_base<N> fabs(real_base<N> const &r); /* Here we just add template<> to the previous version */ template<> real fabs(real const &r) { return real(r.m_value < 0 ? -r.m_value : r.m_value); }  So, what happens with double x = fabs(-5.0); when we forget to include <cmath> now? Well, here is what GCC says: In function ‘int main()’: error: no matching function for call to ‘fabs(double)’ note: candidate is: note: template<int N> real_base<N> fabs(const real_base<N>&)  It seems we’ve successfully managed to avoid the problematic implicit conversion, yet still allow it in places where it was useful! So what is the rule? It’s simple: where implicit conversion should not be allowed, make the function a specialised template function. • Posted: 2012-02-08 22:14 (Updated: 2012-02-09 00:31) • Author: sam • Categories: code tip c++ • Comments (201) # C/C++ trick: static string hash generation I am always interested in having the compiler do more things for me, without giving away code clarity or performance for the convenience. Today a colleague linked me to Pope Kim's Compile-Time Hash String Generation article which is a perfect example of the things I like: hidden syntactic sugar that does useful things. ## Inline hash function The goal: for a given hash function, write something like HASH_STRING("funny_bone") in the code, and have the compiler directly replace it with the result, 0xf1c6fd7f. The solution: inline the function and hope that the compiler will be clever enough. #include <string.h> #define HASH(str) generateHash(str, strlen(str)) inline unsigned int generateHash(const char *string, size_t len) { unsigned int hash = 0; for(size_t i = 0; i < len; ++i) hash = 65599 * hash + string[i]; return hash ^ (hash >> 16); }  Unfortunately Pope ran into several very problematic issues: • requires heavy optimisation flags (/O2 with Visual Studio, -O3 with g++) • limited to 10-character strings with Visual Studio • limited to 17-character strings with g++ I could personally reproduce the g++ limitations. I believe they are more related to loop unrolling limits than to the actual string size, but they indeed make the technique unusable in practice. ## Macro-based hash function If you read my previous article about C/C++ preprocessor LUT generation, you may remember that it used preprocessor tricks to do loop unrolling. Hence the following implementation: #include <string.h> #include <stdint.h> #include <stdio.h> #define H1(s,i,x) (x*65599u+(uint8_t)s[(i)<strlen(s)?strlen(s)-1-(i):strlen(s)]) #define H4(s,i,x) H1(s,i,H1(s,i+1,H1(s,i+2,H1(s,i+3,x)))) #define H16(s,i,x) H4(s,i,H4(s,i+4,H4(s,i+8,H4(s,i+12,x)))) #define H64(s,i,x) H16(s,i,H16(s,i+16,H16(s,i+32,H16(s,i+48,x)))) #define H256(s,i,x) H64(s,i,H64(s,i+64,H64(s,i+128,H64(s,i+192,x)))) #define HASH(s) ((uint32_t)(H256(s,0,0)^(H256(s,0,0)>>16)))  It has the following properties: • works in C in addition to C++ • strings are always optimised away by gcc or g++ (but not always the computation itself) • hash computation is optimised away by gcc or g++ even with -O, -O1 or -Os • string size limit is 256 characters (probably more than enough for most uses) and can be manually increased or decreased The following code: int main(void) { printf("%08x\n", HASH("funny_bone")); printf("%08x\n", HASH("incredibly_large_string_that_gcc_groks_easily")); }  Is (correctly) optimised to this with gcc -Os:  ... movl-238617217, %esi
movl    $.LC0, %edi xorl %eax, %eax call printf movl$-453669173, %esi
PATH="$PATH:/home/sam/android/android-sdk-linux_x86/tools"  This is best defined in one of your shell’s startup scripts such as .zshenv. ## Build and install package Now, whenever you are in an NDK project’s directory, build the project using: ndk-build && ant release  And to upload it to the emulator or to a connected device: ant release install  That’s all! Those two simple commands can easily be launched from your preferred development environment. Update: ant compile no longer exists in recent SDKs; replaced with ant release. ## Run package You can use adb to run any application remotely. For instance: adb shell am start -a android.intent.action.MAIN -n$PACKAGENAME/.\$ACTIVITYNAME


Both package name and activity name can be found in your AndroidManifest.xml.

# Load PNGs from assets using Android NDK

Many developers appear to embed libpng with their NDK project in order to decode PNGs. While libpng does offer great flexibility, the amount of code necessary to decode an image is surprisingly high, and the additional work needed to maintain a libpng build means that most of the time, using the system’s decoding routines is perfectly reasonable.

But wait, isn’t the NDK for C++ development only? True, but usually we are still running in a virtual machine that has access to a large panel of high-level utility libraries. This article actually demonstrates a broader, useful technique I call return-to-JVM that you can use for other purposes than simply PNG loading.

I suggest putting your PNG files in the assets directory of your application, so that they can be accessed by path.

First, let’s decide of a Java class and object that will act as a PNG factory and manager for us. Let’s call it PngManager:

import android.content.res.AssetManager;

public class PngManager
{
private AssetManager amgr;

public Bitmap open(String path)
{
try
{
return BitmapFactory.decodeStream(amgr.open(path));
}
catch (Exception e) { }
return null;
}

public int getWidth(Bitmap bmp) { return bmp.getWidth(); }
public int getHeight(Bitmap bmp) { return bmp.getHeight(); }

public void getPixels(Bitmap bmp, int[] pixels)
{
int w = bmp.getWidth();
int h = bmp.getHeight();
bmp.getPixels(pixels, 0, w, 0, 0, w, h);
}

public void close(Bitmap bmp)
{
bmp.recycle();
}
}


Now to load the PNG from the C++ part of the program, use the following code:

jobject g_pngmgr;
JNIEnv *g_env;

/* ... */

char const *path = "images/myimage.png";

jclass cls = g_env->GetObjectClass(g_pngmgr);
jmethodID mid;

/* Ask the PNG manager for a bitmap */
mid = g_env->GetMethodID(cls, "open",
"(Ljava/lang/String;)Landroid/graphics/Bitmap;");
jstring name = g_env->NewStringUTF(path);
jobject png = g_env->CallObjectMethod(g_pngmgr, mid, name);
g_env->DeleteLocalRef(name);
g_env->NewGlobalRef(png);

/* Get image dimensions */
mid = g_env->GetMethodID(cls, "getWidth", "(Landroid/graphics/Bitmap;)I");
int width = g_env->CallIntMethod(g_pngmgr, mid, png);
mid = g_env->GetMethodID(cls, "getHeight", "(Landroid/graphics/Bitmap;)I");
int height = g_env->CallIntMethod(g_pngmgr, mid, png);

/* Get pixels */
jintArray array = g_env->NewIntArray(width * height);
g_env->NewGlobalRef(array);
mid = g_env->GetMethodID(cls, "getPixels", "(Landroid/graphics/Bitmap;[I)V");
g_env->CallVoidMethod(g_pngmgr, mid, png, array);

jint *pixels = g_env->GetIntArrayElements(array, 0);


Now do anything you want with the pixels, for instance bind them to a texture.

And to release the bitmap when finished:

g_env->ReleaseIntArrayElements(array, pixels, 0);
g_env->DeleteGlobalRef(array);

/* Free image */
mid = g_env->GetMethodID(cls, "close", "(Landroid/graphics/Bitmap;)V");
g_env->CallVoidMethod(g_pngmgr, mid, png);
g_env->DeleteGlobalRef(png);


This will not work out of the box. There are a few last things to do, which will hugely depend on your global application architecture and are thus left as an exercise to the reader:

• Store an AssetManager object in PngManager::amgr before the first call to open() is made (for instance by calling Activity::getAssets() upon application initialisation).
• Store in g_env a valid JNIEnv * value (the JNI environment is the first argument to all JNI methods), either by remembering it or by using jvm->AttachCurrentThread().
• Store in g_pngmgr a valid jobject handle to a PngManager instance (for instance by calling a JNI method with the instance as an argument).
• Error checking was totally omitted from the code for the sake of clarity.
• Some of the dynamically retrieved variables could benefit from being cached.

I hope this can prove helpful!

For a C++-only solution to this problem, see Load pngs from assets in NDK by Bill Hsu.