Does a release sequence headed by a RMW establish a synchronization relationship?

#include <iostream>
#include <atomic>
#include <thread>
#include <cassert>
int main(){
    std::atomic<int> v{0};
    int x = 0;
    std::thread t1([&](){
        x = 1;
        v.exchange(1,std::memory_order::release); // #1
    });
    std::thread t2([&](){
        int expected = 1;
        while(!v.compare_exchange_strong(expected,3,std::memory_order::relaxed)){}  // #2
        v.fetch_sub(1,std::memory_order::relaxed); // #3
    });
    std::thread t3([&](){
        int expected = 2;
        while(!v.compare_exchange_strong(expected,10,std::memory_order::relaxed)){} // #4
        assert(x ==1);
    });
    t1.join();
    t2.join();
    t3.join();
}

[intro.races] p5 says:

A release sequence headed by a release operation A on an atomic object M is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation is an atomic read-modify-write operation.

In this example, the modification order of v is {0, 1, 3, 2,10}, since #4 read the value written by #3, and #3 can be considered in the release sequence headed by #1, Hence, the assert never fails, Is this a correct understanding?

Solution

You're correct that any release operation can head a release sequence, whether a plain store or the store of a RMW. And you're correct in your analysis that (#1, #2, #3, #4) is a release sequence headed by #1, and that #4 reads the value written by #3.

However, you cannot conclude that #1 synchronizes with #4, because #4 is not acquire. Remember that in trying to get synchronization from acquire/release operations, we are always appealing to atomics.order p2:

An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

Every element of this definition is satisfied by taking A = #1, M = v, B = #4, and "any side effect in the release sequence" = #3, except that B = #4 does not perform an acquire operation. So we do not get to conclude that #1 synchronizes with #4. As a result, the store of x in t1 does not happen before the load of x in t3 (*) , nor vice versa. So the program contains a data race, and its behavior is undefined; in particular, the assert may fail.

The fact that #4 is itself an element of the release sequence headed by #1 isn't relevant to this analysis. (Indeed, the fact that it's an RMW is irrelevant to this program, because the stored value 10 is never loaded.)

It's not hard to see how the program can fail in practice with a typical implementation: since #4 is relaxed, the compiler and/or CPU are free to reorder it with the following load of x. So the load from x may take place long before the updated value is seen in v. In particular, nothing stops it from taking place before the store to x from t1, and so the old value 0 could be loaded.

If you upgraded #4 to memory_order_acquire or stronger, then you could indeed conclude that the program has well-defined behavior and that the assert cannot fail. (It's still irrelevant that #4 does a RMW; the conclusion would be the same if it were a plain acquire load.)

(*) To be pedantic, I haven't actually proved that the store of x does not happen before the load of x. But I've shown that atomics.order p2 doesn't lead us to conclude that it does happen before, and I claim (without proof) that no other provision of the memory model allows us to reach that conclusion either. So at best, it is unspecified whether the store happens before the load, and since UB results if it doesn't, this effectively means the behavior of the program is undefined.