javacollectionshashmaplinkedhashmapkeyset

Why do we need LinkedHashMap if keySet() maintains order for a HashMap?


public class HashMapKeySet {

public static void main(String[] args) {
    Map<HashCodeSame,Boolean> map=new HashMap();

    map.put(new HashCodeSame(10),true);
    map.put(new HashCodeSame(2),false);

    for(HashCodeSame i:map.keySet())
        System.out.println("Key: "+i+"\t Key Value: "+i.getA()+"\t Value: "+map.get(i)+"\t Hashcode: "+i
                .hashCode());

    System.out.println("\nEntry Set******");
    for(Map.Entry<HashCodeSame, Boolean> i:map.entrySet())
        System.out.println("Key: "+i.getKey().getA()+"\t Value: "+i.getValue()+"\t Hashcode: "+i.hashCode());

    System.out.println("\nValues******");
    for(Boolean i:map.values())
        System.out.println("Key: "+i+"\t Value: "+map.get(i)+"\t Hashcode: "+i.hashCode());

}

static class HashCodeSame{

    private int a;

    public int getA() {
        return a;
    }

    public void setA(int a) {
        this.a = a;
    }

    HashCodeSame(int a){
        this.a=a;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        HashCodeSame that = (HashCodeSame) o;

        return a == that.a;

    }

    @Override
    public int hashCode() {
        return 1;
    }
}

}

If you could see in above example, I have explicitly made the hashcode() return 1 in all cases, to check what happens, when there is a collision of key.hashcode() in hashmap. What happens, a linked list is maintained for these Map.Entry objects such as

1(key.hashcode()) will link to <2,false> will link to <10,true>

(as false value is entered after true value, as I understand).

But when I do keySet(), true is returned first and then false, instead of false returning first.

So , what I am assuming here, since keySet() is a set and set maintains order, we get true and false while iteration. But, then again, why don’t we say hashmap maintains order, as the only way to retrieve is by order. Or Why do we use LinkedHashMap?

 Key: DS.HashMapKeySet$HashCodeSame@1    Key Value: 10   Value: true     Hashcode: 1
Key: DS.HashMapKeySet$HashCodeSame@1     Key Value: 2    Value: false    Hashcode: 1

Entry Set******
Key: 10  Value: true     Hashcode: 1230
Key: 2   Value: false    Hashcode: 1236

Values******
Key: true    Value: null     Hashcode: 1231
Key: false   Value: null     Hashcode: 1237

Now , when I add chsnge the hashcode method to return a like

@Override
    public int hashCode() {
        return a;
    }

I get reverse order. Plus on addition of

    map.put(new HashCodeSame(10),true);
    map.put(new HashCodeSame(2),false);
    map.put(new HashCodeSame(7),false);
    map.put(new HashCodeSame(3),true);
    map.put(new HashCodeSame(9),true);

output received is,

    Key: DS.HashMapKeySet$HashCodeSame@2     Key Value: 2    Value: false    Hashcode: 2
Key: DS.HashMapKeySet$HashCodeSame@3     Key Value: 3    Value: false    Hashcode: 3
Key: DS.HashMapKeySet$HashCodeSame@7     Key Value: 7    Value: false    Hashcode: 7
Key: DS.HashMapKeySet$HashCodeSame@9     Key Value: 9    Value: true     Hashcode: 9
Key: DS.HashMapKeySet$HashCodeSame@a     Key Value: 10   Value: true     Hashcode: 10

Entry Set******
Key: 2   Value: false    Hashcode: 1239
Key: 3   Value: false    Hashcode: 1238
Key: 7   Value: false    Hashcode: 1234
Key: 9   Value: true     Hashcode: 1222
Key: 10  Value: true     Hashcode: 1221

Values******
Key: false   Value: null     Hashcode: 1237
Key: false   Value: null     Hashcode: 1237
Key: false   Value: null     Hashcode: 1237
Key: true    Value: null     Hashcode: 1231
Key: true    Value: null     Hashcode: 1231

Now it again makes me wonder, why is the order coming in sorted way.? Can anyone explain me in detail how does keySet(), entrySet() methods of hashmap works?


Solution

  • HashMap does not have a defined iteration order, and LinkedHashMap does have a specified iteration order.

    The difficulty with HashMap is that it's easy to construct simple examples where the iteration order is quite predictable and is fairly stable, even though this is not guaranteed.

    Suppose for example that you did this:

        Map<String, Boolean> map = new HashMap<>();
        String str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        for (int i = 0; i < str.length(); i++) {
            map.put(str.substring(i, i+1), true);
        }
        System.out.println(map.keySet());
    

    The result is

    [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
    

    Hey! Those are in order! Well, the reason is that String's hashCode() function is pretty lousy, and it's exceptionally lousy for single-character strings. Here is String's hashCode() specification. Essentially it's a sum-and-multiply, but for a single-character string it's just the Unicode value of that char. So the hashcodes of the single-character strings above are 65, 66, ... 90. The internal table of HashMap is always a power of two, and in this case it's 64 entries long. The table entry used is the key's hashCode() value right-shifted by 16 bits and XORed with itself, modulo the table size. (See the code here.) Thus, these single-character strings end up in sequential buckets in the HashMap table, in array positions 1, 2, ... 26.

    Key iteration proceeds sequentially through the buckets, so the keys end up coming out in the same order they were put in. Again, this isn't guaranteed, it just so happens to work this way because of the characteristics of the various pieces of implementation as described above.

    Now consider HashCodeSame where the hashCode() function returns 1 every time. Adding a few of these objects to a HashMap will cause them all to end up in the same bucket, and since iteration traverses sequentially down the linked list, they'll come out in order:

        Map<HashCodeSame, Boolean> map = new HashMap<>();
        for (int i = 0; i < 8; i++) {
            map.put(new HashCodeSame(i), true);
        }
        System.out.println(map.keySet());
    

    (I've added a toString() method that does the obvious thing.) The result is:

    [HCS(0), HCS(1), HCS(2), HCS(3), HCS(4), HCS(5), HCS(6), HCS(7)]
    

    Again, the keys come out in order, because of coincidence of implementation, but for different reasons than above.

    But wait! In JDK 8, HashMap will convert a bucket from a linear linked list to a balanced tree if too many entries appear in the same bucket. This happens if more then 8 entries end up in the same bucket. Let's try that:

        Map<HashCodeSame, Boolean> map = new HashMap<>();
        for (int i = 0; i < 20; i++) {
            map.put(new HashCodeSame(i), true);
        }
        System.out.println(map.keySet());
    

    The result is:

    [HCS(5), HCS(0), HCS(1), HCS(2), HCS(3), HCS(4), HCS(6),
    HCS(18), HCS(7), HCS(11), HCS(16), HCS(17), HCS(15), HCS(13),
    HCS(14), HCS(8), HCS(12), HCS(9), HCS(10), HCS(19)]
    

    The bottom line is that HashMap does not maintain a defined iteration order. If you want a specific iteration order, you must use LinkedHashMap or a sorted map such as TreeMap. Unfortunately, HashMap has an iteration order that is fairly stable and predictable, in fact, just predictable enough to lull people into thinking that its order is well-defined, when in fact it isn't.


    To help combat this problem, in JDK 9, new hash-based collections implementations will randomize their iteration order from run to run. For example:

        Set<String> set = Set.of("A", "B", "C", "D", "E",
                                 "F", "G", "H", "I", "J");
        System.out.println(set);
    

    This printed out the following when run in different invocations of the JVM:

    [I, H, J, A, C, B, E, D, G, F]
    [C, B, A, G, F, E, D, J, I, H]
    [A, B, C, H, I, J, D, E, F, G]
    

    (The iteration order is stable within a single run of the JVM. Also, existing collections such as HashMap do not have their iteration order randomized.)