pythondictionaryhashperfect-hashinjective-function

Injective two-way mappings


I often deal with mappings which are injective. In programming terminology, this can be expressed as a dictionary where all values are unique as well as, of course, all keys.

Is there a memory-efficient data structure for injective mappings with all the time-complexity properties you expect from dictionaries?

For example:

d = {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

d.get(2) = 'b'  # this works with a normal dictionary
d.get('b', reverse=True) = 2  # but this is not possible

All the solutions in Two way/reverse map seem to require using or combining two sets of mappings, focusing on making it easier to perform operations on a two-way map. This is fine for small dictionaries that fit neatly in memory, but not good for large dictionaries.

The requirement is there should be no additional memory overhead storing the injective two-way map versus a regular dictionary storing only one-way mappings.

I understand dictionaries use a hash table, which use an associative array data type. By definition, associative arrays implement key -> value mappings with unique keys. Is it possible, theoretically or in practice, to produce a smart injective mapping which allows reverse lookup?

If it is not possible, I would appreciate an explanation why such a construct is difficult, or impossible, to implement with the same efficiency as dictionaries.

Update

Following the discussion with @rpy (see comments below), any information on how to set up a python dictionary-like object using a perfect reversible hash function would be useful. But, of course, a working implementation would be ideal (I have already tried perfection).


Solution

  • The net answer to your question is: NO (for any efficient implementation)

    You put up two requirements that can not be fulfilled at the same time:

    1. Do not use extra memory for the reverse mapping
    2. Do not add extra time for doing (reverse) look-ups

    Why are those two restrictions prohibiting a solution?

    Mappings are pairs of values (tuples). The most trivial implementation would be:

    Sequential searching all tuples for a match.

    This would have identical complexity for forward and backward mapping.
    However, this clearly violates the expectation of time-complexity properties you expect from dictionaries:

    If you would allow for O(n) complexity, then searching a tuple set sequentially would give you a proper solution.

    Usually dictionary implementations try to get down to O(1) or at least O(n*log(n)) complexity. This is being achieved by introducing additional data for speeding up look-ups, like hashes or trees. Unfortunately, such aids only help for one direction as they either deal with keys (forward mapping case) or values (reverse mapping case).

    So, as soon as you need to keep look-up complexity down (this also applies for modification complexity, but usually dictionaries are tailored towards fast look-up), you will need to add data for achieving speed.

    The whole issue turns down to the classic memory vs. speed trade-off.

    EDIT:

    An approach for addressing the problem in a general implementation (for cases where keys and values allow for getting a numeric representant if those are not integral numbers in the first place) might be:

    Calculate a hash value for key and one for value and register the tuple under both hash values. This way you can take key or value and identify the matching tuple and return the proper result. This would even work for non injective cases when you allow for returning sets of matching tuples.

    This will require more space (double the hash entries) while keeping look-up complexity within values typical for hash based dicts. You might need to keep an eye on hash bucket size (length of collision chains) especially when value sets of keys and values are not disjoint)