pythonhashhashtableprimesdouble-hashing

Best prime numbers to choose for a double hashed hash table size?


What are the best prime numbers to choose for a double hashed hash table size?

side info

what I have in mind:

Thanks, any additional questions appreciated


Solution

  • Choose high of twin prime numbers, i. e. when p and p - 2 are primes, choose p as double hash capacity, because hash_code % (size - 2) is a good secondary step function for double hashing algorithm, and modulo prime number is somewhat more "robust" than modulo composite number (if size - 2 is composite).

    For small sizes (somewhere around 1000 or so) choose all primes, except low ones of twin pairs, because twin pairs are too rare in the beginning of natural numbers scale, for good size predictability.

    Add sizes of 5 and 11 (though they are low in twin primes) to better address very small table sizes.

    Exclude numbers that are frequently used in multiplication hash functions, in Java it is 31 that is used in String hash function, I don't know about Python.

    All the above is carefully coded in this Java runnable, with a lot of pre-generated table sizes (trying to keep 0.005 max difference between neighbouring table sizes):

    https://github.com/OpenHFT/Koloboke/blob/0498951705b45be2e1528afd786c03308c36e5dc/lib/impl/src/main/java/net/openhft/koloboke/collect/impl/hash/DHashCapacities.java#L255-L272

    P. S. My personal belief is that double hashing is never an optimal open addressing flavor, because of modulo operations which are disproportionately expensive in modern CPUs. Consider using QHash.