c++language-lawyer atomic memory-barriers memory-model

Is atomic_thread_fence(memory_order_release) different from using memory_order_acq_rel?

cppreference.com provides this note about std::atomic_thread_fence (emphasis mine):

atomic_thread_fence imposes stronger synchronization constraints than an atomic store operation with the same std::memory_order.

While an atomic store-release operation prevents all preceding writes from moving past the store-release, an atomic_thread_fence with memory_order_release ordering prevents all preceding writes from moving past all subsequent stores.

I understand this note to mean that std::atomic_thread_fence(std::memory_order_release) is not unidirectional, like a store-release. It's a bidirectional fence, preventing stores on either side of the fence from reordering past a store on the other side of the fence.

If I understand that correctly, this fence seems to make the same guarantees that atomic_thread_fence(memory_order_acq_rel) does. It is an "upward" fence, and a "downward" fence.

Is there a functional difference between std::atomic_thread_fence(std::memory_order_release) and std::atomic_thread_fence(std::memory_order_acq_rel)? Or is the difference merely aesthetic, to document the purpose of the code?

Solution

A standalone fence imposes stronger ordering than an atomic operation with the same ordering constraint, but this does not change the direction in which ordering is enforced.

Bot an atomic release operation and a standalone release fence are uni-directional, but the atomic operation orders with respect to itself whereas the atomic fence imposes ordering with respect to other stores.

For example, an atomic operation with release semantics:

std::atomic<int> sync{0};

// memory operations A

sync.store(1, std::memory_order_release);

// store B

This guarantees that no memory operation part of A (loads & stores) can be (visibly) reordered with the atomic store itself. But it is uni-directional and no ordering rules apply to memory operations that are sequenced after the atomic operation; therefore, store B can still be reordered with any of the memory operations in A.

A standalone release fence changes this behavior:

// memory operations A

std::atomic_thread_fence(std::memory_order_release);

// load X

sync.store(1, std::memory_order_relaxed);

// stores B

This guarantees that no memory operation in A can be (visibly) reordered with any of the stores that are sequenced after the release fence. Here, the store to B can no longer be reordered with any of the memory operations in A, and as such, the release fence is stronger than the atomic release operation. But it also uni-directional since the load from X can still be reordered with any memory operation in A.

The difference is subtle and usually an atomic release operation is preferred over a standalone release fence.

The rules for a standalone acquire fence are similar, except that it enforces ordering in the opposite direction and operates on loads:

// loads B

sync.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);

// memory operations A

No memory operation in A can be reordered with any load that is sequenced before the standalone acquire fence.

A standalone fence with std::memory_order_acq_rel ordering combines the logic for both acquire and release fences.

// memory operations A
// load A

std::atomic_thread_fence(std::memory_order_acq_rel);

// store B
//memory operations B

But this can get incredibly tricky once you realize that a store in A can still be reordered with a load in B. Acq/rel fences should probably be avoided in favor of regular atomic operations, or even better, mutexes.