javahashmapequalshashcode

Why does changing the hashcode of an object used as a key in a HashMap make a lookup return null?


Consider the following scenario:

Object o1 = new Object();
Object o2 = new Object();

HashMap<Object, Object> map = new HashMap<Object, Object>();
map.put(o1, o2);

boolean test1 = map.get(o1) == o2; // This evaluates to true

// Now lets say we alter the state of o1:
o1.setSomeInternalState(Object newState);

boolean test2 = map.get(o1) == o2; // This evaluates to false, because now map.get(o1) returns null

Assume that the class for o1 has overridden equals() and hashCode().

I've encountered this issue during debugging because I had explicitly overridden equals and hashCode on one particular object I'm using in some business logic. I can fully appreciate why the hashcode of the object changes when I alter its state, but why should the map.get(o1) return null because of it? There is only one object, so shouldn't the key's hashcode match?


Solution

  • The HashMap class maps keys to values by running the hashCode of the key through a hash function. The hash function is used to create an index into an array of buckets. For example, a very primitive hash function would be hashCode % tableSize. Changing the key's hashCode would alter the index created by the hash function, meaning there is nothing to be found in that bucket.

    Let's run an example, assuming that the initial hashCode is 15 and the table size is 4:

                             ┌----------------------┐
    15 (initial hashCode) -> | hashCode % tableSize | -> index 3
                             |    (hash function)   |
                             └----------------------┘
    

    So let's insert the value at index 3:

      ┌------┐
    0 | null |
      |------|
    1 | null |
      |------|
    2 | null |
      |------|
    3 | key! | <- insert
      └------┘
    

    Now let's modifiy the key's hashCode so that it is now 13:

                              ┌----------------------┐
    13 (modified hashCode) -> | hashCode % tableSize | -> index 1
                              |    (hash function)   |
                              └----------------------┘
    

    What's at index 1? Nothing, null.

    A lot of things have been simplified here. In a real hash table implementation, the hash function is much more complex to create a more even distribution. Also, the buckets are linked-lists so collisions can be handled.