chashperfect-hash

Is it possible to make a minimal perfect hash function in this situation?


I want to create a Hash Map (or another structure, if you have any suggestions) to store key value pairs. The keys will all be inserted at once at the same time as the map is created, but I don't know what the keys will be (arbitrary length strings) until runtime, when I need to create the map.

I am parsing a query string like this "x=100&name=bob&color=red&y=150" (but the string can have an unlimited number of variables and the variables can have any length name).

I want to parse it once and create a Hash Map, preferably minimal and with a perfect hash function to satisfy linear storage requirements. Once the map is created the values won't be modified or deleted, no more key value pairs will be added to the map either, so the entire map is effectively a constant. I'm assuming that a variable doesn't occur twice in the string (IE. "x=1&x=2" is not valid).

I am coding in C, and currently have a function that I can use like get("x") which will return the string "100", but it parses the query string each time which takes O(n) time. I'd like to parse it once when it is first loaded since it is a very large query string and every value will be read several times. Even though I'm using C, I don't need code in C as an answer. Pseudocode, or any suggestions at all would be awesome!


Solution

  • Try GPL'd gperf, or Bob Jenkins' public domain implementation in C

    Procedure:

    Edit Necrolis noted in the comment below that the reference implementations output perfect hash functions in C source code, so you'll need to modify them to generate something like a bytecode for a VM instead. You could also use an interpretative language like embedded Scheme or Lua.

    It would be interesting to know if this is worth the effort over a simple (non-perfect) HashMap when the overhead of creating the perfect hash function is amortized over the lookups

    Another option is Cuckoo hashing which also has O(1) lookups