c++cachingarmmemory-barriersstdatomic

Is it guaranteed, that read-modify-write operation reads (and returns) a correct old value on weak memory models?


Let's say, I have a following code, that utilizes RMW operations, and is executed on WMM CPU (for example, on ARM):

std::atomic<int> shared;

std::atomic<bool> t0_is_last;
std::atomic<bool> t1_is_last;

void t0() {
    int result0 = shared.fetch_or(1 << 0, std::memory_order_relaxed);
    if (result == (1 << 1)) { 
        t0_is_last.store(true, std::memory_order_relaxed); 
    }
}

void t1() {
    int result1 = shared.fetch_or(1 << 1, std::memory_order_relaxed);
    if (result == (1 << 0)) { 
        t1_is_last.store(true, std::memory_order_relaxed);
    }
}

Here, two threads t0 and t1 try to fetch_or the same variable with relaxed memory ordering. And if one of the threads sees, that it's the last thread to succeed fetch_or operation, then it stores true with relaxed memory ordering to it's flag, that says, that the following thread was last to fetch_or variable shared (flags t0_is_last and t1_is_last).

So, is it true, that exactly one thread (not zero or both threads) gets to modify its tX_is_last varialbe? In other words: is it true, that after t0 and t1 finish, the following holds: t0 ^ t1?

Or also in other words: is it true, that the value returned by read-modify-write is present in the system exactly before the write operation (exactly the value that was modified)?


Solution

  • The processor architecture doesn't matter.

    When you write C++ code, then you are writing against C++'s memory model, not that of the architecture you are compiling for. It is the responsibility of the compiler to translate the C++ code's semantics in the C++ memory model to the appropriate instructions with regards to the architecture's memory model so that the observable behavior is identical to what C++ requires on its abstract machine.

    C++'s memory model guarantees that there is for each atomic object a single total order of all modifications. A RMW operation such as fetch_or is guaranteed to return the value immediately preceding the modification done by the operation in this modification order. See [atomics.order]/10.

    So, assuming you intended shared to be initialized to 0 (which you are currently forgetting, beware that default-construction doesn't initialize atomics before C++20), then either the first thread's fetch_or comes first in the modification order, storing 1, and the second thread's fetch_or therefore must read 1 or the second thread comes first, storing 2, and the first thread's fetch_or reads 2.

    In either case the first fetch_or will return 0.

    So yes, exactly one of the tX_is_last stores will happen (assuming there aren't any further modifications of shared in the program).

    Also note that none of this has anything to do with memory ordering. It is completely irrelevant what memory ordering you choose. Memory ordering only affects how access to other objects than the atomic object are ordered (including other atomic objects).