PS: Given that discussions on the motivations or trade-offs of JDK implementation details are often met with resistance on StackOverflow, with a prevailing sentiment that JDK engineers are entitled to make decisions independently (evidenced by the closure of a previous post on JDK motivation), I want to clarify that this question is strictly focused on the algorithmic and structural aspects of HashMap, particularly the analysis and engineering considerations involved in choosing between two separate chaining implementations (head insertion and tail insertion).
In separate-chaining HashMaps, collisions are managed by linking entries in a linked list. When a collision occurs, the new element can be inserted at the head or the tail of the list. Both approaches have the same worst-time complexity, as a full scan of the list is required to identify duplicates or the insertion point. However, conventional wisdom and algorithmic education (e.g., Sedgewick's Algorithms source, Introduction to Algorithms, CLRS p.258) suggest that inserting at the head is preferable, as newer elements are more likely to be accessed soon.
Curiously, while JDK7's HashMap
(lines 402, 766, and addEntry()
method in JDK 7 source code) follows this convention by inserting at the head, JDK8 switches to tail insertion (lines 611, 641, and putVal()
method in JDK 8 source code).
My question is: What are the trade-offs between head insertion and tail insertion in separate-chaining HashMaps? Are there practical engineering considerations, such as multithreading issues, that might influence this choice? I've come across several discussions (e.g., this blog) suggesting that insertion at the head could lead to dead loops if not synchronized properly. Can anyone provide insights or further resources on this topic?
Note: I'm seeking an in-depth technical analysis and any relevant experiences or resources. Thank you for your contributions!
in general, more recently inserted elements have more chances to be looked up
That might be true with additional assumptions. For example, if there is a large, long-living structure, possibly stored on a disk, older elements may be 'outdated' and not looked up anymore.
Here we are talking about a structure stored in memory, typically short-living, used to make some computations and deleted afterward. If there is no assumption about the relation between the insertion order and access frequency, there is no reason to assume that more recent elements are accessed more often.
Also, where a value is inserted has nothing to do with concurrency. In such cases, a synchronized, thread-safe structure like ConcurrentHashMap
should be used, and both methods would work.
With that being said, the JDK can implement it either way. I think the choice that was more convenient and resulted in clearer code has been made. I guess JDK 7 inserts into the head because it reduces the complexity by avoiding the necessity of checking if there is already any value for the given hash in the table. JDK 8 has changed the implementation significantly. Here when inserting a new node, we are just after reaching the last node in the list, and it might look more natural to the author to write
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
than
if ((e = p.next) == null) {
tab[i] = newNode(hash, key, value, tab[i]);
but both ways would work fine.