lock-free

How does the epoch-based reclamation work step-by-step, specifically, the EBR with 3 epochs in it?


I checked some online resources to find out how the EBR epoch-based reclamation strategy work in c++ lockfree code. But I still do not fully understand it.

Is there a nice and simple way to learn it step-by-step?


Solution

  • Epoch based reclamation (EBR) was first described by Keir Fraser in his thesis Practical lock-freedom - so this can be seen as the canonical reference.

    EBR is based the concept of grace periods. A grace period is a time interval [a, b] such that, after time b, all nodes removed before time a can safely be reclaimed. Nodes that have been removed from data structures are kept in limbo lists that hold the references to the nodes until it is safe to reclaim them (i.e., until no stale references can possibly exist).

    In EBR the programmer has to define a critical region in which a thread is allowed to access shared objects. (Note: this critical region has nothing to do with the term critical section that is often used in the context of mutual exclusion) These regions have to be entered and left explicitly. A global epoch count is used to determine when no stale references exist to any object in a limbo list.

    Every thread has a flag that indicates whether this thread is currently in a critical region, as well as a local epoch count that identifies the epoch it currently executes in (in case it currently is inside a critical region). The thread's local epoch count may lag at most one epoch behind the global epoch. Each time a thread enters a critical region it sets the flag and observes the current epoch, i.e., it updates its local epoch to match the global epoch. A thread that removes a node from a data structure places this node on the current limbo list (i.e., the limbo list that is associated with the current epoch).

    After some predetermined number of critical region entries, a thread will attempt to update the global epoch. This succeeds only if all threads in a critical region have already observed the current epoch. In that case the limbo list that was populated two epochs ago can safely be reclaimed and the list itself can immediately be recycled and reused for the next epoch. Thus only three epochs (and limbo lists) are required in total.