Let's say, I have a following code, that utilizes RMW operations, and is executed on WMM CPU (for example, on ARM):
std::atomic<int> shared;
std::atomic<bool> t0_is_last;
std::atomic<bool> t1_is_last;
void t0() {
int result0 = shared.fetch_or(1 << 0, std::memory_order_relaxed);
if (result == (1 << 1)) {
t0_is_last.store(true, std::memory_order_relaxed);
}
}
void t1() {
int result1 = shared.fetch_or(1 << 1, std::memory_order_relaxed);
if (result == (1 << 0)) {
t1_is_last.store(true, std::memory_order_relaxed);
}
}
Here, two threads t0
and t1
try to fetch_or
the same variable with relaxed memory ordering. And if one of the threads sees, that it's the last thread to succeed fetch_or
operation, then it stores true
with relaxed memory ordering to it's flag, that says, that the following thread was last to fetch_or
variable shared
(flags t0_is_last
and t1_is_last
).
So, is it true, that exactly one thread (not zero or both threads) gets to modify its tX_is_last
varialbe?
In other words: is it true, that after t0
and t1
finish, the following holds: t0 ^ t1
?
Or also in other words: is it true, that the value returned by read-modify-write is present in the system exactly before the write operation (exactly the value that was modified)?
The processor architecture doesn't matter.
When you write C++ code, then you are writing against C++'s memory model, not that of the architecture you are compiling for. It is the responsibility of the compiler to translate the C++ code's semantics in the C++ memory model to the appropriate instructions with regards to the architecture's memory model so that the observable behavior is identical to what C++ requires on its abstract machine.
C++'s memory model guarantees that there is for each atomic object a single total order of all modifications. A RMW operation such as fetch_or
is guaranteed to return the value immediately preceding the modification done by the operation in this modification order. See [atomics.order]/10.
So, assuming you intended shared
to be initialized to 0
(which you are currently forgetting, beware that default-construction doesn't initialize atomics before C++20), then either the first thread's fetch_or
comes first in the modification order, storing 1
, and the second thread's fetch_or
therefore must read 1
or the second thread comes first, storing 2
, and the first thread's fetch_or
reads 2
.
In either case the first fetch_or
will return 0
.
So yes, exactly one of the tX_is_last
stores will happen (assuming there aren't any further modifications of shared
in the program).
Also note that none of this has anything to do with memory ordering. It is completely irrelevant what memory ordering you choose. Memory ordering only affects how access to other objects than the atomic object are ordered (including other atomic objects).