cbit-manipulationstrlenmagic-numbers

How to determine if a byte is null in a word


I am reading the "strlen" source code from the glibc, and the trick developers found to speed it up is to read n bytes where n is the size of a long word, instead of reading 1 byte at each iteration.

I will assume that a long word has 4 bytes.

The tricky part is that every "chunk" of 4 bytes the function reads can contain a null byte, so at each iteration, the function has to check if there was a null byte in the chunk. They do it like

if (((longword - lomagic) & ~longword & himagic) != 0) { /* null byte found */ }

where longword is the chunk of data and himagic and lowmagic are magical values defined as:

himagic = 0x80808080L;
lomagic = 0x01010101L;

Here is the comment for thoses values

/* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits
 the "holes."  Note that there is a hole just to the left of
 each byte, with an extra at the end:

 bits:  01111110 11111110 11111110 11111111
 bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD

 The 1-bits make sure that carries propagate to the next 0-bit.
 The 0-bits provide holes for carries to fall into.  */

How does this trick of finding the null byte work?


Solution

  • From the famous "Bit Twiddling Hacks" page By Sean Eron Anderson, a description of what is currently used in the glibc implementation you're referring to (Anderson calls the algorithm hasless(v, 1)):

    The subexpression (v - 0x01010101UL), evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than 0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than 0x80 in the first sub-expression are masked off by the second.

    It appears that the comment(s) in the glibc source is confusing because it doesn't apply to what the code is actually doing anymore - it's describing what would have been an implementation of the algorithm that Anderson describes just before the hasless(v, 1) algorithm is described.