Java's Cleaner
under the hood uses a doubly linked list to hold PhantomReference
s until their referents become phantom reachable. When the Cleaner
's daemon thread pops and removes a PhantomReference
from its ReferenceQueue
and removes it from the list, the whole list must be locked (the removal code is @synchronized
on the list).
In contrast, a ConcurrentHashMap
could still offer O(1) access to PhantomReference
s, but also support concurrent modifications by letting multiple cleaner threads remove PhantomReference
s (as keys) from the map at the same time.
Is there a reason Java went with the doubly linked list approach? Is it because a Cleaner
only uses a single daemon thread for cleanup so lack of concurrency doesn't matter? Or is there a deeper reason for that choice?
Performance using the doubly linked list is much better than using a ConcurrentHashMap
because the cleaner works with the node objects of the doubly linked list.
Removing a node is just adjusting two pointers in the list (one from the previous node, one from the next node) and adjusting two pointers in the current node (to mark it as removed). The source code for this is 12 lines of code (https://github.com/openjdk/jdk21u/blob/master/src/java.base/share/classes/jdk/internal/ref/PhantomCleanable.java#L101).
Compare these 12 lines with the work that ConcurrentHashMap
needs to do to remove a single entry (https://github.com/openjdk/jdk21u/blob/master/src/java.base/share/classes/java/util/concurrent/ConcurrentHashMap.java#L1110): that are more than 70 lines of code and much more complex code.
Both operations are O(1), but that doesn't mean that ConcurrentHashMap
needs just the same amount of time - it only means that in both datastructures the runtime doesn't increase when it contains more data.