hashhashcodefnv

Is this fixup to FNV-1a hash a good idea or a bad idea?


algorithm fnv-1 is
    hash := FNV_offset_basis

    for each byte_of_data to be hashed do
        hash := hash XOR byte_of_data
        hash := hash × FNV_prime
        hash := hash XOR multiply_carry

    return hash 

Where multiply_carry is the high half of the multiplication result that doesn't fit in the variable hash?

Basically there's two problems with FNV; 1) the tendency to increase the number of set bits due to multiplicative bias, and 2) the low bit problem where the low bit is the XOR of only the low bits of the input.

My experience with lesser hashes (namely those that depend on MOD prime number at the end) says that this reliably fixes the bit bias but I'm not actually sure this works here.

Lifting the code from recursive's answer and expanding out the definition of FNV1a yields:

const uint FnvOffsetBasis32 = 2166136261;
const uint FnvPrime32 = 0x01000193;
long totalPop = 0;

for (long i = 0; i <= uint.MaxValue; i++)
{
    uint hash = FnvOffsetBasis32;
    hash ^= (uint)(i) & 255;
    hash *= FnvPrime32;
    hash ^= (uint)(i >> 8) & 255;
    hash *= FnvPrime32;
    hash ^= (uint)(i >> 16) & 255;
    hash *= FnvPrime32;
    hash ^= (uint)(i >> 24) & 255;
    hash *= FnvPrime32;
    //totalPop += uint.PopCount((uint)i * FnvPrime32);
    totalPop += uint.PopCount(hash);
}

const long totalInts = 0x1_0000_0000;

// Expected population: 68,719,476,736
Console.WriteLine("Expected population: {0:0,0}", 16 * totalInts);

// Actual population: 68,719,435,411
Console.WriteLine("Actual population: {0:0,0}", totalPop);

Note despite the fact we're kicking around C# demo code, this question isn't in C#. The actual usage is for page-aligned hashtables.


Solution

  • For the next person coming by here; the immediate answer, is no, this doesn't work.

    The bit bias in FNV-1a is real. The bit bias in the low 16 bits is worse than in the upper bits.

    However, the proposed fix makes it worse. The simple transform that works wonders on mod-prime hashes just doesn't apply; must be something to with the FNV primes.

    I'm still looking for really good ides, but they aren't easy to come by.