chashtablehash-collisiondouble-hashing

What is a propper way of calculating the size of a hash table


I am bullding a hash table using double hashing to solve collision. How can I know what is the propper size? I know it has to prime to minmize the number of collisions.


Solution

  • The easiest way to implement hash tables is to use a power-of-2 sized hash tables.

    The reason is that if N = 2M, then calculating H % N is as simple as calculating H & (N - 1).

    With fast hash functions such as MurmurHash3_32, the slowest part of using the hash table is actually calculating the modulo. H & (N - 1) does not calculate modulo but bitwise AND which is much faster (and it's the same as modulo if N is a power of 2).

    Somebody could validly claim that MurmurHash suffers from seed-independent multicollisions and therefore is susceptible to a hash collision denial of service attack. That's true, but you shouldn't use linked lists to resolve hash collisions. You should use only hash tables where the keys are sortable by some comparison function (larger than, equal, smaller than) and then you can use red-black trees (or AVL trees) to resolve hash collisions. If there's no natural comparison functions (such as for complex numbers), you can invent one.

    Using a red-black tree that almost always is just a single root element with MurmurHash is much faster than trying to be "secure" by using SipHash and then stupidly using linked lists to resolve hash collisions (which caused the need for the absurdly slow SipHash in the first place).

    In theory, with non-power-of-2-sized hash tables where the size is rarely varying, you could use the "fast division by invariant integers using multiplication" trick but that's slower than power-of-2-sizing and bitwise AND.

    The prime sizing is just for really poor hash functions. MurmurHash, although it suffers from seed-independent multicollisions, does not suffer from collisions with reasonable (non-attacker-generated) keys if the table size is a power of 2.