javahashmaphash-collisionstring-hashing

Will similar Strings in HashMap lead to increased chance of collisions?


Consider the following:

HashMap<String, Object> hm = new HashMap<>();
final String prefix = "My objects ";
int counter = 0;

void put(Object value) {
    hm.put(prefix+(counter++), value);
}

Given that the key of each entry starts with the same string and only differs by a number appended to it, is this likely to lead to more collisions? I'm trying to decide whether this way of creating unique keys is a good idea from a performance perspective.


Solution

  • No it will not. And that is not necessarily because of String#hashcode; but because a HashMap will re-hash whatever your hashcode is by XOR-ing firs 16 bits with the last 16.

    // this is re-hashing that is done internally
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    

    But even if that would increase collision, you might never feel it. For small buckets/bin where entries are placed one after another (in a linked fashion), equals will be called to get the actual entry you care about.

    In case a certain bin/bucket reaches a certain threshold, it will be transformed in a perfectly balanced tree node. The search time in such a tree is 0(logn).

    Even if the same entries report the same hashcode after the re-hash, a map has still to decide which entry is bigger in case of a tie.

    It would then try to invoke Comparable#compareTo in case your Keys implement Comparable. In case they do not implement Comparable, System.identityHashcode will be called to decide in case of a tie.

    As you say from a performance perspective because of all these internal things, your average search time will be O(1) in the Map.