pythonhashtablehashcodedictionary

What's a correct and good way to implement __hash__()?


What's a correct and good way to implement __hash__()?

I am talking about the function that returns a hashcode that is then used to insert objects into hashtables aka dictionaries.

As __hash__() returns an integer and is used for "binning" objects into hashtables I assume that the values of the returned integer should be uniformly distributed for common data (to minimize collisions). What's a good practice to get such values? Are collisions a problem? In my case I have a small class which acts as a container class holding some ints, some floats and a string.


Solution

  • An easy, correct way to implement __hash__() is to use a key tuple. It won't be as fast as a specialized hash, but if you need that then you should probably implement the type in C.

    Here's an example of using a key for hash and equality:

    class A:
        def __key(self):
            return (self.attr_a, self.attr_b, self.attr_c)
    
        def __hash__(self):
            return hash(self.__key())
    
        def __eq__(self, other):
            if isinstance(other, A):
                return self.__key() == other.__key()
            return NotImplemented
    

    Also, the documentation of __hash__ has more information, that may be valuable in some particular circumstances.