c++multithreadinglock-freereference-counting

How does the split reference counting work in a lock free stack?


I am reading C++ concurrency in action 2nd. It introduces split reference counts for a lock free stack.

One possible technique involves the use of not one but two reference counts for each node: an internal count and an external count. The sum of these values is the total number of references to the node. The external count is kept alongside the pointer to the node and is increased every time the pointer is read. When the reader is finished with the node, it decreases the internal count. A simple operation that reads the pointer will leave the external count increased by one and the internal count decreased by one when it’s finished.

Above phrase explains the brief concept of split reference count. It sounds like that external count is always increased and internal count is always decreased.

When the external count/pointer pairing is no longer required (the node is no longer accessible from a location accessible to multiple threads), the internal count is increased by the value of the external count minus one and the external counter is discarded. Once the internal count is equal to zero, there are no outstanding references to the node and it can be safely deleted. It’s still important to use atomic operations for updates of shared data. Let’s now look at an implementation of a lock-free stack that uses this technique to ensure that the nodes are reclaimed only when it’s safe to do so.

But in the above phrase, the internal count should be increased by the value of the external count minus one when the node is no longer accessible. I'm very confused about this explanation.

(1) What is the exact purpose of using internal and external count?

(2) Why does it require two reference counts instead of one?

Edit: I added example code below from the book.

template <typename T>
class lock_free_stack {
 private:
  struct node;
  struct counted_node_ptr {
    int external_count;
    node* ptr;
  };
  struct node {
    std::shared_ptr<T> data;
    std::atomic<int> internal_count;
    counted_node_ptr next;
    node(T const& data_)
        : data(std::make_shared<T>(data_)), internal_count(0) {}
  };
  std::atomic<counted_node_ptr> head;

  void increase_head_count(counted_node_ptr& old_counter) {
    counted_node_ptr new_counter;
    do {
      new_counter = old_counter;
      ++new_counter.external_count;
    } while (!head.compare_exchange_strong(old_counter, new_counter,
                                           std::memory_order_acquire,
                                           std::memory_order_relaxed));
    old_counter.external_count = new_counter.external_count;
  }

 public:
  ~lock_free_stack() {
    while (pop())
      ;
  }

  void push(T const& data) {
    counted_node_ptr new_node;
    new_node.ptr = new node(data);
    new_node.external_count = 1;
    new_node.ptr->next = head.load();
    while (!head.atomic_compare_exchange_weak(new_node.ptr->next, new_node,
                                              std::memory_order_release,
                                              std::memory_order_relaxed))
      ;
  }

  std::shared_ptr<T> pop() {
    counted_node_ptr old_head = head.load(std::memory_order_relaxed);
    for (;;) {
      increase_head_count(old_head);
      node* const ptr = old_head.ptr;

      if (!ptr) {
        return std::shared_ptr<T>();
      }

      if (head.compare_exchange_strong(old_head, ptr->next,
                                       std::memory_order_relaxed)) {
        std::shared_ptr<T> res;
        res.swap(ptr->data);
        int const count_increase = old_head.external_count - 2;
        if (ptr->internal_count.fetch_add(
                count_increase, std::memory_order_release) == -count_increase) {
          delete ptr;
        }
        return res;
      } else if (ptr->internal_count.fetch_add(-1, std::memory_order_relaxed) ==
                 1) {
        ptr->internal_count.load(std::memory_order_acquire);
        delete ptr;
      }
    }
  }
};

Solution

  • Short explanation

    Split reference count decreases contention. Adding external_count-2 to internal_count separates two phases of reference counting in this implementation: in the first phase reference count is split and in the second phase reference count is in internal_count only.

    Please read below for detailed explanation.

    Reference counting

    Let's start with why we need reference counting in this case of lock-free stack from the book. We need to prevent accessing next field of node pointed by head, if this node has been deleted. And we want to delete node pointed by head when it is not used anymore. To achieve this, we delay deleting this node using reference counting.

    It is important that node can be referenced only by head (when node is first in the list) or next pointer of previous node (when node is in the middle or in the end of the list), but not both at the same time - because there are no situations when the head points to the node in the middle or in the end of the list. This is why we do not need a separate control block to be pointed from multiple counted node pointers (as in shared_ptr). So, we can put counter directly into the head pointer.

    Although we need to protect access to fields of node pointed by head.ptr only (actually to field next only), putting counter into head pointer is not enough. We also have to put counter into the next pointer - to continue protecting the node from being deleted even if the node is pushed and popped before it. We save and restore the head pointer counter into next pointer if we push the node B before the node A that is being popped by other thread Tb and then popping node B before thread Tb.

    Incrementing counter and loading current pointer should be a single atomic operation to ensure that pointer that is dereferenced gets its counter increased.

    Why split reference counting?

    We split counter into internal and external for two reasons:

    1. We do not need an atomic operation on both counter and pointer when decrementing counter. It is enough to atomically change counter. If we had a single counter, we would have to decrement pointer in a loop like we do in increase_head_count.

    2. Using two atomic counters reduces contention, although it increases code complexity, memory usage and requires to add two counters for checking.

    To protect node from being deleted we increment external_count. internal_count is inside the node that counted pointer is pointing to.

    Back to non-split reference counting

    As soon as header pointer is moved to the next node in compare_exchange operation during pop operation, no thread can increment pointer counter, because threads which have read previous head will fail compare_exchange operation, and threads that have not read head yet will never read this head, as head has moved to the next node. This is why external count/pointer pairing is no longer needed. Its value is added to internal_count. After this, counter is no more split: we are working with a single counter internal_count now, which has aggregated all counter increases and decreases in a single variable.

    We can think about this as two phases: in the first phase our counter is split (into external_count and internal_count) and in the second phase our counter is merged into a single variable - internal_count. In the first phase we need to add external_count to internal_count to get real counter value. In the second phase we have to use internal_count only, because external_count has some old value that does not make any sense now (as the book says, now "external counter is discarded").

    Why subtract 2 from external_count?

    When adding external_count to internal_count, external_count value is decreased by 2. This operation can be divided into three for simplicity:

    1. Add external_count to internal_count moving to the second phase (now we do not use external_count anymore).
    2. Decrement internal_count by 1 because head no longer points to this node (remember that we set external_count to 1 for every new node because head points to this node).
    3. Decrement internal_count by 1 to signal that this pointer protection is no longer needed in the current thread (remember that we increased external_count by 1 in increase_head_count to protect pointer)

    Comparing result of internal_count.fetch_add(external_count - 2) to 2 - external_count is just the atomic way of comparing resulting internal_count to zero.

    In the code external_count is decremented by 2 before adding to internal_count, and in the book text "internal count is increased by the value of the external count minus one". I think this is a mistake in the text (probably, author was initially planning to additionally decrement internal_count by 1 separately from adding external_count-1).