perfect-hash

Minimal Perfect Hash Function


I have 10^8 random integers in the range [0; 2^63-1]. There are no duplicates. The full list is known at compile-time, but it is just unique random numbers. These numbers never change.

To store one integer explicitly, 8 bytes are required, and each integer has an associated 1-byte value, so explicit storage requires about 860 MB.

So, I want to find a minimal perfect hash function to map each of the 10^8 integers from [0;2^63-1] to [0;10^8-1]. I should find this function only once, the data never changes, and the function may be complicated if necessary. But it should be minimal, perfect, and calculation should be fast. How can I do this?

Might it be possible to find and use some subsequences (if they occur)?

Thanks.


Solution

  • Let your computer do the work for you:

    http://www.gnu.org/software/gperf/

    Quote: "GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only. "