C/C++ trick: static lookup table generation

There are two major ways of using lookup tables (LUTs) in C/C++ code:

  • build them at runtime,
  • embed them in the code.

One major advantage of runtime initialisation is the choice between static initialisation (at program startup), or lazy initialisation (on demand) to save memory. Also, the generating code can be complex, or use information that is only available at runtime.

In the case of an embedded table, the generation cost is only at compile time, which can be very useful. Also, the compiler may take advantage of its early knowledge of the table contents to optimise code. However, quite often the content of embedded tables is abstruse and hardly useful to someone viewing the code. Usually this is due to the use of an external program for generation, sometimes in a completely different language. But the generation can also often be done using the C/C++ preprocessor.

Practical example

Consider the bit interleaving routine at Sean Eron Anderson's Bit Twiddling Hacks page (which, by the way, I recommend you read and bookmark). It uses the following LUT (shortened for brevity):

static const unsigned short MortonTable256[256] = 
{
  0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015, 
  0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055, 
  0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115, 
  ... 32 lines in total ...
  0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};

The MortonTable256 table has, as its name suggests, 256 elements. It was pregenerated by some external piece of code which probably looked like this:

for (int i = 0; i < 256; i++)
    MortonTable256[i] = (i & 1) | ((i & 2) << 1) | ((i & 4) << 2) | ((i & 8) << 3);

The problem with that external piece of code is that it is external. You cannot write it in this form and have it fill the table at compile time.

If you only take the output of this table, the information on how the table was created is lost. It makes it impractical to build another table that has, for instance, all values shifted one bit left. Even if such a table was created using a modified version of the above code, switching between the two tables would be a hassle unless both versions were kept between preprocessor tests.

Preprocessor iterator

Here is one way to get the best of both worlds. First, declare the following iterator macros. They can be declared somewhere in a global .h, maybe with more descriptive names:

#define S4(i)    S1((i)),   S1((i)+1),     S1((i)+2),     S1((i)+3)
#define S16(i)   S4((i)),   S4((i)+4),     S4((i)+8),     S4((i)+12)
#define S64(i)   S16((i)),  S16((i)+16),   S16((i)+32),   S16((i)+48)
#define S256(i)  S64((i)),  S64((i)+64),   S64((i)+128),  S64((i)+192)
#define S1024(i) S256((i)), S256((i)+256), S256((i)+512), S256((i)+768)

Their purpose is simple: calling eg. S16(i) will expand to S1(i), S1(i+1), …, S1(i+15). Similarly, S256(i) will call S1 with values from i to i + 255 times.

And this is how to use them in our example:

static const unsigned short MortonTable256[256] = 
{
#define S1(i) ((i & 1) | ((i & 2) << 1) | ((i & 4) << 2) | ((i & 8) << 3))
    S256(0)
#undef S1
};

That's it! The table will be built at compile time, and you get to keep the logic behind it.

A more complex example

Jeroen van der Zijp's fast half float conversions paper describes table-based methods to convert between 16-bit and 32-bit floating point values. The construction of one of the LUTs is as follows:

void generatetables(){
  for(unsigned int i=0; i<256; ++i){
    int e=i-127;
    if(e<-24){                  // Very small numbers map to zero
      basetable[i|0x000]=0x0000;
      basetable[i|0x100]=0x8000;
    } else if(e<-14){             // Small numbers map to denorms
      basetable[i|0x000]=(0x0400>>(-e-14));
      basetable[i|0x100]=(0x0400>>(-e-14)) | 0x8000;
    } else if(e<=15){             // Normal numbers just lose precision
      basetable[i|0x000]=((e+15)<<10);
      basetable[i|0x100]=((e+15)<<10) | 0x8000;
    } else if(e<128){             // Large numbers map to Infinity
      basetable[i|0x000]=0x7C00;
      basetable[i|0x100]=0xFC00;
    } else{                       // Infinity and NaN's stay Infinity and NaN's
      basetable[i|0x000]=0x7C00;
      basetable[i|0x100]=0xFC00;
    }
  }
}

And this is the compile-time version :

static uint16_t const basetable[512] =
{
#define S1(i) (((i) < 103) ? 0x0000 : \
               ((i) < 113) ? 0x0400 >> (0x1f & (113 - (i))) : \
               ((i) < 143) ? ((i) - 112) << 10 : 0x7c00)
    S256(0),
#undef S1
#define S1(i) (0x8000 | basetable[i])
    S256(0),
#undef S1
};

In this case the macro code is slightly bigger and was slightly rewritten, but is no more complicated than the original code. Note also the elegant reuse of previous values in the second half of the table.

This trick is certainly not new, but since I have found practical uses for it, I thought you may find it useful, too.

  • Posted: 2011-12-20 22:21 (Updated: 2011-12-20 22:26)
  • Author: sam
  • Categories: code tip

Comments

1. anonymous -- 2011-12-21 11:21

An example of a trig lookup table (Taylor series maybe) would be excellent.

2. anonymous -- 2011-12-22 14:42

U should start using D Programming-Language ;).....

3. sam -- 2011-12-22 16:43

@anonymous #1: unfortunately there are two problems with this. First, most parameters for trigonometric functions would at some point use floating point computation, and constant expressions are only available in C++11. Second, I believe Taylor series are not a good solution and better polynomials should be preferred, but the complexity of these polynomials is such that I believe this kind of coefficients are best computed in a separate pass.

4. iliis.junk@gmail.com -- 2013-11-19 15:11

Thanks, this is indeed useful. Something that you may have to look out for is operator precedence:

#define S1(i)  i*10

This will produce wrong results, as i is first replaced by i+x and the evaluation happens afterwards:

S256(0) -> 0*10, 0+1*10, ... , 0+64+1*10, ...

In short: Always surround i with brackets when defining S1:

#define S1(i)  (i)*10
287. This Site -- 2017-02-01 07:26

30YGOE This blog is very good! How did you make it !?

288. Learn More -- 2017-02-01 11:03

7ZjvrB Wow, incredible blog format! How lengthy have you ever been running a blog for? you make blogging look easy. The whole glance of your website is great, as well as the content!

344. XRumerTest -- 2017-04-03 02:05

Hello. And Bye.

Add New Comment