Given:
std::atomic<uint64_t> b;
void f()
{
std::atomic_thread_fence(std::memory_order::memory_order_acquire);
uint64_t a = b.load(std::memory_order::memory_order_acquire);
// code using a...
}
Can removing the call to std::atomic_thread_fence
have any effect? If so is there a succinct example? Keeping in mind that other functions may store/load to b
and call f
.
Never redundant. atomic_thread_fence
actually has stricter ordering requirements than a load with mo_acquire
. It's poorly documented, but the acquire fence isn't one-way permiable for loads; it preserves Read-Read and Read-Write order between accesses on opposite sides of the fence.
Load-acquires on the other hand only require ordering between that load and subsequent loads and stores. Read-Read and Read-Write order is enforced ONLY between that particular load-acquire. Prior loads/stores (in program order) have no restrictions. Thus the load-acquire is one-way permiable.
The release fence similarly loses one-way permiability for stores, preserving Write-Read and Write-Write. See Jeff Preshing's article https://preshing.com/20130922/acquire-and-release-fences/.
By the way, it looks like you have your fence on the wrong side. See Preshing's other article https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/. With an acquire-load, the load happens before the acquire, so using fences it would look like this:
uint64_t a = b.load(std::memory_order::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order::memory_order_acquire);
Remember that release doesn't guarantee visibility. All release does is guarantee the order in which writes to different variables become visible in other threads. (Without this, other threads can observe orderings that seem to violate cause-and-effect.)
Here's an example using CppMem tool (http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/). The first thread is SC, so we know the writes occur in that order. So if c==1, then a and b should both be 1 as well. CppMem gives "48 executions; 1 consistent, race free", indicating that it is possible for the 2nd thread to see c==1 && b==0 && a==0. This is because c.load
is allowed to be reordered after a.load
, permeating past b.load
int main() {
atomic_int a = 0;
atomic_int b = 0;
atomic_int c = 0;
{{{ {
a.store(1, mo_seq_cst);
b.store(1, mo_seq_cst);
c.store(1, mo_seq_cst);
} ||| {
c.load(mo_relaxed).readsvalue(1);
b.load(mo_acquire).readsvalue(0);
a.load(mo_relaxed).readsvalue(0);
} }}}
}
If we replace the acquire-load with an aquire-fence, c.load
is not allowed to be reordered after a.load
. CppMem gives "8 executions; no consistent" confirming that it is not possible.
int main() {
atomic_int a = 0;
atomic_int c = 0;
{{{ {
a.store(1, mo_seq_cst);
c.store(1, mo_seq_cst);
} ||| {
c.load(mo_relaxed).readsvalue(1);
atomic_thread_fence(mo_acquire);
a.load(mo_relaxed).readsvalue(0);
} }}}
}
Edit: Improved first example to actually show the variable crossing an acquire operation.