I am looking at solution for the set bit count problem (given a binary number, how to efficiently count how many bits are set).
Here, http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive, I have found some methods.
What about the lookup table method? I dont understand what properties of binary representation / number make it work.
static const unsigned char BitsSetTable256[256] =
{
# define B2(n) n, n+1, n+1, n+2
# define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
# define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
B6(0), B6(1), B6(1), B6(2)
};
unsigned int v; // count the number of bits set in 32-bit value v
unsigned int c; // c is the total bits set in v
// Option 1:
c = BitsSetTable256[v & 0xff] +
BitsSetTable256[(v >> 8) & 0xff] +
BitsSetTable256[(v >> 16) & 0xff] +
BitsSetTable256[v >> 24];
// Option 2:
unsigned char * p = (unsigned char *) &v;
c = BitsSetTable256[p[0]] +
BitsSetTable256[p[1]] +
BitsSetTable256[p[2]] +
BitsSetTable256[p[3]];
// To initially generate the table algorithmically:
BitsSetTable256[0] = 0;
for (int i = 0; i < 256; i++)
{
BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
}
In particular, I dont understand the BitsSetTable256
definition at first. Why define these quantities B2, B4,... ? it seems to me that they are not used afterwards.
Could you hint at further doc on binary representation?
Thanks!
The definitions are to form the table by patterns. They are recursive macros, B6 uses B4 and B4 uses B2. B6(0) will get broken into:
B4(0), B4(1), B4(1), B4(2)
B4(0) will get broken into:
0, 1, 1, 2
The first few numbers of the sequence will be:
// 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3
As you can see, these are the number of bits set for each index in the table.
The rest of the algorithm is that you are breaking the number into 8-bit chunks and summing the number of bits set in each chunk, that's what these lines are about (you use either option 1 or option 2 at your liking, not both):
// Option 1:
c = BitsSetTable256[v & 0xff] +
BitsSetTable256[(v >> 8) & 0xff] +
BitsSetTable256[(v >> 16) & 0xff] +
BitsSetTable256[v >> 24];
// Option 2:
unsigned char * p = (unsigned char *) &v;
c = BitsSetTable256[p[0]] +
BitsSetTable256[p[1]] +
BitsSetTable256[p[2]] +
BitsSetTable256[p[3]];
The code at the bottom:
// To initially generate the table algorithmically:
BitsSetTable256[0] = 0;
for (int i = 0; i < 256; i++)
{
BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
}
Is a different way of generating the BitsSetTable256. It generates the table at runtime instead of at compile-time (which is what the macro definition does.
P.S. If you're targeting recent enough (SSE4) x86 you can use POPCNT instruction.