statichashtabledata-oriented-designperfect-hashstatic-allocation

How to create an efficient static hash table?


I need to create small-mid sized static hash tables from it. Typically, those will have 5-100 entries. When the hash table is created, all keys hashes are known up-front (i.e. the keys are already hashes.) Currently, I create a HashMap, which is I sort the keys so I get O(log n) lookup which 3-5 lookups on average for the sizes I care. Wikipedia claims that a simple hash table with chaining will result in 3 lookups on average for a full table, so that's not yet worth the trouble for me (i.e. taking hash%n as the first entry and doing the chaining.) Given that I know all hashes up-front, it seems to be that there should be an easy way to get a fast, static perfect hash -- but I couldn't find a good pointer how. I.e. amortized O(1) access with no (little?) additional overhead. How should I implement such a static table?

Memory usage is important, so the less I need to store, the better.

Edit: Notice that it's fine if I have have to resolve one collision or so manually. I.e. if I could do some chaining which on average has direct access and worst-case 3 indirections for instance, that's fine. It's not that I need a perfect hash.


Solution

  • This is a well studied problem. Dan Bernstein famously used constant databases with guaranteed performance in tinydns en qmail:

    CDB and deratives use a external file for the k-v content, but there is no need to do this if the number a entries is very small.

    Some papers that describe the algorithms en the math: