[SOLVED] Why would my C++ stdlib's shared_ptr use acq_rel ordering when decrementing the atomic ref count?

Why would my C++ stdlib's shared_ptr use acq_rel ordering when decrementing the atomic ref count?

For my machine's C++ standard library, it uses relaxed ordering to increment the ref count in the shared_ptr's control block, but acq_rel ordering to decrement it. Why would it do this? Why wouldn't it, say, use release ordering in the increment and acquire ordering in the decrement?

Solution

Contrary to a common believe std::shared_ptr is not actually thread-safe. The only part of the shared_ptr that is thread-safe is the control-block with the reference counters. However, the shared_ptr object itself is not thread-safe. So in order to get a new reference to the shared object, you have to take a copy of an existing shared_ptr, which implies that you already have a safe reference. If the shared_ptr instance itself is shared and accessed by multiple threads concurrently, you have to use some other way to synchronize access to it, e.g., by using atomic_load and atomic_store which usually use a mutex internally. Therefore, the ref-cnt increment can be relaxed because it is not required to established any orderings.

Regarding the decrement in the destructor it makes sense to consider what exactly we need to order. A shared_ptr allows multiple threads to have safe references to the same shared object. You are responsible to make sure that any concurrent operations on that object are correctly synchronized (if necessary), e.g., if you modify the object's state. This is something that is completely outside the scope of the shared_ptr. However, the shared_ptr is responsible for destroying the object once the last reference to it is dropped. But in order to correctly destroy the object, we have to ensure that we have an up to date view of that object. Or put another way, that the call to the destructor happens-after any previous modifications.

When a thread drops a shared_ptr that is not the last reference (i.e., after the decrement the ref-cnt is not zero), there is nothing left to do - the object does not have to be destroyed yet. The only thing we need to make sure is that any modifications we made to the object can become visible to the thread that will destroy the object. That is achieved by using memory_order_release for the decrement operation.

When a thread drops the last reference (i.e., after the decrement the ref-cnt is zero), this thread becomes responsible for destroying the object. And in order to establish the required happens-before order, we need it to synchronize-with these release-decrements from the other threads, i.e., we need some acquire-operation. So effectively the last thread has to use memory_order_acquire while all other threads have to use memory_order_release.

But we don't know it advance if we are the last thread - we can only determine that after the decrement. So the simplest solution is to just use memory_order_acq_rel - that way, every decrement synchronizes-with any previous decrement. But theoretically it would also be possible to just use memory_order_release for the decrement and instead perform an additional load(std::memory_order_acquire) or an acquire-fence once we have realized that this has been the last reference.