memorycompiler-constructiongarbage-collectioncontainersgc-roots

How are garbage collector (GC) roots stored?


I know how roots are found, but the thing is, (AFAIK) they have to be found at runtime. For that, you'll need a fixed size container that may overflow or a resizable container. I don't want to go with the fixed size container because it would not be easy to tell how much space to reserve (and it may be wasteful). The resizable container seems best, but the problem is, the GC runs when there is not enough space, so the resizable container won't be able to store what it needs to. So with these conditions, how are GC roots stored?


Solution

  • A GC root is a location outside the heap that can contain a reference to an object inside the heap. A location can be anything that can store a reference. Usually it is four or eight bytes of memory storing 32 or 64 bit addresses but it can also be a machine register or space on disk. Sometimes a location is called a "slot" because you can "slot in" exactly one reference. A classical mark & sweep collector works by first marking all objects referenced by the roots and then continues the tracing from there.

    Where and how roots are stored depends on the VM and gets very complicated when you consider things like partial GC, threading and JITting. But conceptually it's simple. Suppose you have a Python-like language with only functions and globals and the following code:

     0: FOO = "hel"
     1: BAR = "hi"
     2: def foo(x):
     3:    y = x + "there"
     4:    <GC HERE>
     5:    return
     6: def bar(x):
     7:    y = x + "lo"
     8:    foo(y)
     9:    return
    10: bar(FOO)
    11: ...
    

    Assume GC happens on the indicated line. The callstack would look something like this:

    Return address to line 11
    Reference to object "hello"
    Return address to line 9
    Reference to object "hellothere"
    

    The GC would scan through this callstack distinguishing between return addresses and references and marking the objects it finds. Then it would do the same for global references. They could be stored in a dictionary (hashmap) on the heap and referenced by a single root:

    {name("FOO") : "hel", name("BAR") : "hi"}
    

    Note that the space required for storing all roots is tiny. You only need eight bytes (one reference) for the globals and eight bytes for each element on the callstack. You can run out of stack space and get a stack overflow but with say 256kb preallocated for the stack and proper tail call optimization it is a non-issue.