Posts for the month of April 2012

# LINK : fatal error LNK1104: cannot open file 'XAPID.lib'

Ever got a link error for a library that was referenced nowhere in your Visual Studio project or even in the final `link.exe` command line? Here's a hint: check the contents of static libraries, too. They may be pulling unexpected dependencies behind your back!

If the static library is part of your solution, here is another hint: check that the [Configuration Properties] >> [C/C++] >> [Code Generation] >> [Runtime Library] configuration values match across projects.

# Beyond De Bruijn: fast binary logarithm of a 10-bit number

Recently I needed a method for retrieving the binary logarithm of a 10-bit number (for the curious, it was for the purpose of converting between 32-bit and 16-bit floating point numbers).

Computing the binary logarithm is equivalent to knowing the position of the highest order set bit. For instance, `log2(0x1)` is 0 and `log2(0x100)` is 8.

One well known method for fast binary logarithm is presented at Bit Twiddling Hacks. It is a two-step method where first all lower bits are set to 1 and then a De Bruijn-like sequence is used to perform a table lookup:

```int fastlog2(uint32_t v)
{
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};

v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;

return MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
}
```

That is 12 integer operations and a table lookup.

## Optimising

It should be obvious what the sequence of operations on `v` does: fill the integer with ones starting from the highest order bit. Here are a few examples of what happens to `v` at each step:

 `v` `v |= v >> 1` `v |= v >> 2` `v |= v >> 4` `v |= v >> 8` `v |= v >> 16` 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0002 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0004 0x0006 0x0007 0x0007 0x0007 0x0007 0x0100 0x0180 0x01e0 0x01fe 0x01ff 0x01ff 0x80000000 0xc0000000 0xf0000000 0xff000000 0xffff0000 0xffffffff

There is one obvious optimisation available: since the input is only 10-bit, the last shift operation `v |= v >> 16` can be omitted because the final value was already reached.

```int fastlog2(uint32_t v)
{
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};

v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;

return MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
}
```

10 instructions instead of 12. Not really amazing, but worth mentioning.

## Optimising more?

Could we do better? Now the last line is `v |= v >> 8;` and it is only useful to propagate the 9th and 10th bits to positions 1 and 2. What happens if we omit that line? Let’s see:

• For most values of `v`, the expected value is obtained.
• For values of `v` with a highest order bit at 9th position, `0x1fe` could be obtained instead of `0x1ff`.
• For values of `v` with a highest order bit at 10th position, one of `0x3fc`, `0x3fd` or `0x3fe` could be obtained instead of `0x3ff`.

The list of possible output values would therefore be `0x1`, `0x3`, `0x7`, `0xf`, `0x1f`, `0x3f`, `0x7f`, `0xff`, `0x1fe`, `0x1ff`, `0x3fc`, `0x3fd`, `0x3fe`, `0x3ff`. What happens to these values when multiplying them with the De Bruijn sequence? Let's see:

 `v` `v * 0x07C4ACDDU) >> 27` 0x1 0 0x3 2 0x7 6 0xf 14 0x1f 30 0x3f 29 0x7f 27 0xff 23 0x1fe 15 0x1ff 16 0x3fc 30 0x3fd 31 0x3fe 0 0x3ff 1

Damn! Two values are colliding. It looks like we cannot omit the last line after all.

## Beyond De Bruijn

Let’s give another try at the problem. Usually De Bruijn sequences are built using either nontrivial algorithms, or brute force. Maybe we could find another sequence that has no collision? Or a sequence that is not a De Bruijn sequence but that works for our problem?

Well, let’s just brute force!

(2 seconds later)

```int fastlog2(uint32_t v)
{
static const int MagicTable[16] =
{
0, 1, 2, 8, -1, 3, 5, 9, 9, 7, 4, -1, 6, -1, -1, -1
};

v |= v >> 1;
v |= v >> 2;
v |= v >> 4;

return MagicTable[(uint32_t)(v * 0x5a1a1a2u) >> 28];
}
```

Down to 8 instructions instead of 12. And the lookup table is now half the size!

## Conclusion

It is possible for multiply-and-shift techniques similar to the De Bruijn sequence algorithm to exist for a larger set of problems. Brute forcing the search is a totally valid method for 32-bit multiplication.

The code used for this article is included in the attached file.