javagarbage-collectionjvmshenandoah

Shenandoah 2.0 elimination of forwarding pointer


In Shenandoah 1.0 every single Object had an additional header - called forwarding pointer. Why was that needed and what is the reason that lead to its elimination in Shenandoah 2.0?


Solution

  • First of all, every single java Object has two headers: klass and mark. They have been there in each instance since forever (they can slightly change how a JVM handles their flags internally with recent JVMs, for example) and are used for various reasons (will go into detail about only one of them a bit further in the answer).

    The need for a forwarding pointer is literally in the second part of this answer. The forwarding pointer is needed in both read barrier and write barrier in Shenandoah 1.0 (though the read could skip the barrier for some field types - will not go into detail). In very simple words it simplifies concurrent copy very much. As said in that answer, it allows to atomically switch the forwarding pointer to the new copy of the Object and then concurrently update all references to point to that new Object.

    Things have changed a bit in Shenandoah 2.0 where the "to-space invariant" is in place : meaning all the writes and reads are done via the to-space.This means one interesting thing : once the to-space copy is established, the from-copy is never used. Imagine a situation like this:

        refA            refB
          |               |
    fwdPointer1 ---- fwdPointer2        
                          |
      ---------       ---------  
      | i = 0 |       | i = 0 | 
      | j = 0 |       | j = 0 | 
      ---------       ---------
    

    In Shenandoah 1.0 there were cases when reading via the refA could bypass the barrier (not use it at all) and still read via the from-copy. This was allowed for final fields, for example (via a special flag). This means that even if to-space copy already existed and there were already references to it, there could still be reads (via refA) that would go to the from-space copy. In Shenandoah 2.0 this is prohibited.

    This information was used in a rather interesting way. Every object in Java is aligned to 64 bits - meaning the last 3 bits are always zero. So, they dropped the forwarding pointer and said that : if the last two bits of the mark word are 11 (this is allowed since no else uses it in this manner) -> this is a forwarding pointer, otherwise the to-space copy does yet exists and this is a plain header. You can see it in action right here and you can trace the masking here and here.

    It used to look like this:

    | -------------------|
    | forwarding Pointer |
    | -------------------|
    
    | -------------------|
    |        mark        |
    | -------------------|
    
    | -------------------|
    |        class       |
    | -------------------|
    

    And has transformed to:

    | -------------------|
    | mark or forwarding |     // depending on the last two bits
    | -------------------|
    
    | -------------------|
    |        class       |
    | -------------------|
    

    So here is a possible scenario (I'll skip class header for simplicity):

      refA, refB            
           |               
          mark   (last two bits are 00)   
           |              
        ---------   
        | i = 0 |      
        | j = 0 |      
        ---------  
    

    GC kicks in. The object referenced by refA/refB is alive, thus must be evacuated (it is said to be in the "collection set"). First a copy is created and atomically mark is made to reference that copy (also the last two bits are marked as 11 to now make it a forwardee and not a mark word):

      refA, refB            
           |               
         mark (11) ------  mark (00)   
                               |
        ---------          ---------
        | i = 0 |          | i = 0 |
        | j = 0 |          | j = 0 |
        ---------          ---------
    

    Now one of the mark words has a bit pattern (ends in 11) that indicates that it is a forwardee and not a mark word anymore.

           refA              refB            
             |                 |               
         mark (11) ------  mark (00)   
                               |
        ---------          ---------
        | i = 0 |          | i = 0 |
        | j = 0 |          | j = 0 |
        ---------          ---------
    

    refB can move concurrently, so then refA, ultimately there are not references to the from-space object and it is garbage. This is how mark word acts as a forwarding pointer, if needed.