c++concurrency c++20 atomic memory-model

What is the effect of the change to the definition of release sequences in the C++20 memory model?

Consider this program:

-- Initially --
std::atomic<int> x{0};
int y{0};
    
-- Thread 1 --
y = 1;                                      // A
x.store(1, std::memory_order_release);      // B
x.store(3, std::memory_order_relaxed);      // C

-- Thread 2 --
if (x.load(std::memory_order_acquire) == 3) // D
    print(y);                               // E

Under the C++11 memory model, if the program prints anything then it prints 1.

In the C++20 memory model, release sequences were changed to exclude writes performed by the same thread. How does that affect this program? Could it now have a data-race and print either 0 or 1?

Notes

This code appears in P0982R1: Weaken Release Sequences which I believe is the paper that resulted in the changes to the definition of release sequences in C++20. In that particular example, there is a third thread making a store to x which disrupts the release sequence in a way that is counter-intuitive. That motivates the need to weaken the release sequence definition.

From reading the paper my understanding is that with the C++20 changes, C will no longer form part of the release sequence headed by B, because C is not a Read-Modify-Write operation. Therefore C does not synchronize with D. Thus there is no happens-before relation between A and E.

Since B and C are stores to the same atomic variable and all threads must agree on the modification order of that variable, does the C++20 memory model allow us to infer anything about whether A happens-before E?

Solution

Your understanding is correct; the program has a data race. The store of 3 does not form part of any release sequence, so D is not synchronized with any release store. There is thus no way to establish any happens-before relationship between any two operations from the two different threads, and in particular, no happens-before between A and E.

I think the only thing you can infer from the load of 3 is that D definitely does not happen before C; if it did, then D would be obliged to load a value that was strictly earlier in the modification order of x [read-write coherence, intro.races p17]. That means in particular that E does not happen before A.

The modification order would come into play if you were to load from x again in Thread 2 somewhere after D. Then you would be guaranteed to load the value 3 again. That follows from read-read coherence [intro.races p16]. Your second load is not allowed to observe anything that preceded 3 in the modification order, so it cannot load the values 0 or 1. This would apply even if all the loads and stores in both threads were relaxed.