I am learning bit manipulation and have come across the following code. I'm trying to work out what it does:
long func_c(unsigned long x) {
long val = 0;
// sums of bits in x (done in parallel for each byte segment)
for (int i = 0; i < 8; i++) {
val += (x & 0x0101010101010101L);
x >>= 1;
}
// adds the top 32 bits to the lowest 32 but why?
val += (val >> 32);
// adds the top 16 bits to the lowest 16 but why?
val += (val >> 16);
// adds the top 8 bits to the lowest 8 but why?
val += (val >> 8);
// gets the lowest byte
return val & 0xff;
}
What are the series of right shifts trying to achieve?
First and most important, the code is incorrect. The three lines at the end of the function should be:
val += (val >> 32);
val += (val >> 16);
val += (val >> 8);
Your comments were correct, but the code was not. With the above change, the function works correctly.
The for
loop creates in val
a set of 8 bytes, each one of which contains the number of '1' bits in the corresponding byte of x
.
So remember, we are trying to get the number of '1' bits in the original x
, and so far all we have is byte 0 of val
containing the number of '1' bits in byte 0 of x, byte 1 of val is the number of '1' bits in byte 1 of x, etc. So we want to add together the 8 bytes of val
.
The code gets some additional parallelism by adding the upper 4 bytes of val
to the lower 4 bytes of val
. Remember that each byte has a maximum value of 8, so when you add these two 32-bit values together, each byte adds to the corresponding other byte without overflow. So the 32-bit result now has partial sums of the bytes of the original val
.
Then, get some more parallelism by splitting the 32-bit result into two 16-bit values and adding them together. Once again, the corresponding bytes get added together without overflow.
The final add of the two halves of the 16-bit number gives you the total number of '1' bits in the original x
.