JDK 8 on mac OS, looking at following code from HashMap.java:
public Set<K> keySet() {
Set<K> ks = keySet;
if (ks == null) {
ks = new KeySet();
keySet = ks;
}
return ks;
}
Any changes to the returned ks will reflect in keySet as they always point to the same underlying set, if this is true, can it be written as:
public Set<K> keySet() {
if (keySet == null) {
keySet = new KeySet();
}
return keySet;
}
Are the two code snippets behave equivalent?
If so, why HashMap
uses the first variation rather than the second variation?
Caching to a local variable is done to improve performance. The generated bytecode is smaller, the field is read once and therefore a cache miss could occur only once, and some other things.
This is quite an advanced optimization and should be carried only on very frequently run pieces of code. The reason it was applied here, likely is because HashMap
was written in Java 1.2, when the JIT was very basic and therefore things like these had a considerable impact.
In this case, it is also done to support multi-threaded access.
HashMap
is not synchronized, however it can be shared via safe publication if later it is not modified. If two threads execute the method concurrently, a race condition could occur: the first read in if(keySet == null)
could read a newer value written by another thread and the second read in return keySet;
read the older (null
) value. Using a local variable ensures that if
and return
use the same reference when non-null. So it can never return null
.