cbit-manipulation

Finding position of zero nibble in 64 bits


I am given a 64bit number, made up of 16 nibbles (chunks of 4 bits). I know that there is a single nibble within them that is zero (0b0000) and I need to find it's index (0-15). My constraints are that I cannot use iteration, loops or branching (if/else).

I can only use bit manipulation, but I do have access to builtin functions such as __builtin_ctzll or __builtin_ffsll (I am using GCC).

I don't have very good grasp on bit manipulation, so so far I've only tried the following:

x |= x >> 1;
x |= x >> 2;

This seems to give some sort of idea for where the target nibble is but I can't really find the next step and it has plenty of cases where it doesn't work as intended...

I've also looked at an extremely similar question: Fast lookup of a null nibble in a 64 bit unsigned

The answer provided suggests that the following function should work:

int zero_nibble_position (uint64_t a)
{
    const uint64_t nibble_lsb = 0x1111111111111111ULL;
    const uint64_t nibble_msb = 0x8888888888888888ULL; 
    uint64_t t = (a - nibble_lsb) & (~a & nibble_msb);
    return __builtin_clzll(t)/4;
}

For some inputs it does work, but sometimes not. For example, 0b0001000000111010010101100111001010010001010011001101111011111011 (1169342103320059643) should return 1, but instead returns 0.

In general, as soon as the 0b0000 nibble is after a 0b0001 it gets the index wrong


Solution

  • A property of Alan Mycroft's null-byte detection algorithm (adapted for nibbles here) is that this trick reliably detects the lowest (least-significant) zero, but the bits of the result to the left (more-significant bits) of that zero do not reliably indicate zeroes or non-zeroes.

    When you use a bit scan that counts from the least significant bit up, as the linked answer does, then that doesn't matter: either there is no zero (and so no troublesome bits), or there is a zero and a bit scan from the least significant bit up will ignore the unreliable part of the result.

    Using a scan from the most significant bit down, such as __builtin_clzll, is affected by the unreliable part of the result. So these things cannot be combined.