The title pretty much says it all - what are these self healing barriers and why are they important in Shenandoah 2.0?
This explanation will piggy-back on the first part and the second part of some answers I tried to put around Shenandoah 2.0
.
To really answer this question we need to look at how the load reference barrier
is implemented and how a GC cycle
acts, in general.
When a certain GC cycle
is triggered, it first chooses the regions with the most garbage; i.e.: objects that are in the collection set are very few (this will matter in the future).
The simplest way to understand this topic is via an example. Suppose this is a scheme that now exists in a certain region:
refA refB
|
---------
| mark |
---------
| i = 0 |
| j = 0 |
---------
There is an object that exists in the region and there are two references pointing to it : refA
and refB
. GC
kicks in and this region is chosen to be garbage collected. At the same time there are active threads in the application that try to access this Object via refA
and refB
. Since this object is alive
at some point it needs to be evacuated to a new region (part of the mark-compact
phase).
So: GC
is active and, at the same time, we read via refA/refB
. When we do this reading we step on the load-reference-barrier
, implemented here. Notice how internally it has some "filters" (via a bunch of if/else
statements). Specifically:
it checks if "evacuation is currently in progress". This is done via a thread local flag that is set when evacuation first starts. Let's suppose the answer to this is : yes.
it checks if the object that we are currently operating on is in the "collection-set". This means it is currently marked as alive. Let's suppose this is "yes" also.
the last check is to find out if this object was already "copied" to a different region (it was evacuated). Let's suppose the answer to this is "no", i.e. : obj == fwd
.
At this point in time, a few things happen. First a copy is created and mark
becomes forwardee
refA refB
|
-------------- ---------
| forwardee | ---- | mark |
-------------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
Only later in the code, would refA
and refB
be updated to point to the new (copied) object. But that means an interesting thing. It means that until refA
and refB
are actually made to point to the new object, the object that they currently point, is in the "collection set". So, if GC is active and even if the forwardee
has been established, the load-reference-barrier
still needs to do some work.
So the very smart people behind Shenandoah
said this : why not update the references there, immediately after the forwardee
has been established (or when the forwardee
is already known for other references)? And this is exactly what they did.
Let's suppose we get back to our initial drawing:
refA refB
|
---------
| mark |
---------
| i = 0 |
| j = 0 |
---------
And again, we "enable" all of the filter:
there is a Thread that reads via refA
GC is active
the object behind refA
and refB
is alive.
This is what will happen with "self healing barriers":
refB refA
| |
-------------- ---------
| forwardee | ---- | mark |
-------------- ---------
| i = 0 | | i = 0 |
| j = 0 | | j = 0 |
--------- ---------
The difference is obvious: refA
was moved to point to the new Object via CAS
, on the spot. If there is going to be a read again via refA
(GC is still active), this will result in a much faster load-reference-barrier execution. Why? because refA
points to an object that is not in the "collection set".
But this also means that if we read via refB
and see that fwd != obj
- the code can do the same trick and update the refB
in place, at the time the first read happened via refB
.
This improves performance according to the people familiar with the matter, and I trust them.