Threads A, B, C are doing separate work (no synchronizing is required between them). Once all three complete, thread D will combine their results. So D depends on the completion of A, B and C.
int a = 0;
int b = 0;
int c = 0;
std::atomic_int D_dependencies{ 3 };
thread A:
a = 1;
D_dependencies.fetch_sub(1, std::memory_order_release);
thread B:
b = 1;
D_dependencies.fetch_sub(1, std::memory_order_release);
thread C:
c = 1;
D_dependencies.fetch_sub(1, std::memory_order_release);
thread D:
if(D_dependencies.load(std::memory_order_acquire) == 0)
{
assert(a + b + c == 3);
}
My understanding is that RMW operations like fetch_sub
form a "release sequence" and so the load in thread D should observe all writes if it loads 0 from the atomic variable.
Am I correct?
Yes, that's correct.
There are three overlapping release-sequences, so the acquire-load syncs-with all three of the release-RMWs. The RMWs include release
so they can each head their own release-sequence as well as being part of a longer sequence. (acq_rel
or seq_cst
also include release
and would work here.)
The guarantees in the standard apply for every case where the conditions apply - release store (including as part of an RMW), zero or more intervening RMW operations (of any memory_order
), then an acquire operation syncs-with the original release operation that it saw a value from (or a value dependent on it via a chain of RMWs).
In the formalism of the standard, each release
operation heads its own release sequence, and thus you can have overlapping release sequences. (I think; I didn't double-check the standard's wording.)
It also works to think about a chain of RMWs as one release sequence, and acquire
operations syncing with every release
-or-stronger operation in the chain.
A pure store breaks a release sequence, but you don't have those on D_dependencies
.
Related: