chashhash-functionsigned-integer

Hashing int16_t to uint64_t


I'm trying to make a hash function for int16_t. The function prototype looks like this:

uint64_t hash_int16_t(const void *key);

So far I've gotten this but I don't know if this is the correct approach:

uint64_t hash_int16_t(const void *key)
{
    // key is expected to be an int16_t
    const int16_t *e = (const int16_t*)key;

    uint64_t x = (uint64_t)*e;

    x = (x ^ (x >> 30)) * UINT64_C(0xbf58476d1ce4e5b9);
    x = (x ^ (x >> 27)) * UINT64_C(0x94d049bb133111eb);
    x = x ^ (x >> 31);

    return x;
}

Is there a hash function for signed types? Should I mix the bits using 16 bit unsigned integers or 64 bit unsigned integers will do fine? Will I be loosing information when I cast it to an unsigned type if the integer is negative? Will this generate undefined behavior?

P.S. The code is in C and I've taken the hash function from here.

Edit 1: The argument is const void *key because the user is allowed to store keys as other values like structs or strings. The above function will add support to int16_t keys.

Edit 2: What I'm trying to accomplish is a generic hash table. The user will have to provide a hash function when initializing the hash table and the example above is bundled with the hash table.


Solution

  • Is there a hash function for signed types?

    Sure. A good hash function that works on unsigned types can also work just fine on signed types. If the hash function is good, then it has good uniformity, and so it doesn't matter whether you call a particular bit a "sign bit" or "just another bit." For the purposes of this answer, I'll take it as given that the algorithm you found in the linked thread is "good."

    Should I mix the bits using 16 bit unsigned integers or 64 bit unsigned integers will do fine?

    You can't rely on bit-shift operators to promote the result of shifting a uint16_t to a uint64_t, so you must work with uint64_t as in the code you posted.

    Will I be loosing information when I cast it to an unsigned type if the integer is negative?

    No, because each possible value of an int16_t maps to a distinct value when converted to a uint64_t: the range [0, 32767] maps to [0, 32767] and the range [-32768, -1] maps to [18446744073709518848, 18446744073709551615] (see below for explanation).

    Will this generate undefined behavior?

    No. The C standard (C11) specifies the following for signed-to-unsigned integer conversion (§6.3.1.3):

    [...] if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

    Thus, -32768 converts to -32768 + 264 = 18446744073709518848, and -1 converts to -1 + 264 = 18446744073709551615.


    As for the algorithm itself... if the hash value is only being used to create a hash table, then it isn't necessary for the hash function to have any cryptographic properties like dispersion. As such, this trivial algorithm might work just fine for an int16_t x:

    return (uint64_t) x;
    

    This function has no dispersion, but (trivially) optimal uniformity for the input and output range. Whether this is acceptable will depend on the hash table implementation. If it naively uses only certain bits of the hash value to select a bin to place the value in, and it doesn't do any mixing of its own, you'll need to focus the uniformity of your output on those bits, wherever/whichever they are.