javadata-structuresgarbage-collectionweakhashmap

How does a weak hash map know to garbage-collect an object?


I recently found out about the WeakHashMap data structure in Java.

However, I don't understand what it means by it garbage-collects a mapping when it is no longer in ordinary use. How does the data structure know I will no longer use a key in my program? What if I don't refer to a key for a long time?


Solution

  • However, I don't understand what it means by it garbage-collects a mapping when it is no longer in ordinary use.

    OK. Under normal circumstances, when the garbage collector runs it will remove objects that your program can no longer use. The technical term is an "unreachable object", and it means that the program execution has no way to get hold of a reference to the object any more. An object that is unreachable, may be collected in the next GC cycle ... or not. Either way, it is no longer the application's concern.

    In this case, the WeakHashMap uses a special class called WeakReference to refer to the keys1. A weak reference is an object that acts sort of like an indirect pointer (a pointer to an object holding a pointer). It has the interesting property that the garbage collector is allowed to break the reference; i.e. replace the reference it contains with null. And the rule is that a weak reference to an object will be broken when the GC notices that the object is no longer reachable via a chain of normal (strong) or soft references2.

    The phrase "no longer in ordinary use" really means that the key object is no longer strongly or softly reachable; i.e. via a chain of strong and / or soft references.

    How does the data structure know I will no longer use a key in my program?

    The WeakHashmap doesn't do it. Rather, it is the GC that notices that the key is not strongly reachable.

    As part of its normal traversal, the GC will find and mark all strongly reachable objects. Then it goes through all of the WeakReference objects and checks to see if the objects they refer to have been marked, and breaks them if they have not. (Or something like that ... I've never looked at the actual GC implementation. And it is complicated by the fact that it has to deal with SoftReference and PhantomReference objects as well.)

    The only involvement that WeakHashmap has is that:

    What if I don't refer to a key for a long time?

    The criterion for deciding that a weak reference should be broken is not time based.

    But it is possible that timing influences whether not a key is removed. For instance, a key could 1) cease to be strongly reference, 2) be retrieved from the map, and 3) be assigned to a reachable variable making it strongly referenced once again. If the GC doesn't run during the window in which the key is not strongly reachable, the key and its associated value will stay in the map. (Which is what you'd want to happen ...)


    1 - Implementation detail: in recent Java releases, the weak references actually refer to the map's internal Entry objects rather than the keys. This allows broken references to be purged from the map more efficiently. Look at the code for details.
    2 - Soft references are a kind of reference that the GC is allowed to break if there is a shortage of heap memory.