Until C++17 the standard contained the following paragraph (C++17 Section 32.4 [atomics.order] paragraph 6):
For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there are
memory_order::seq_cst
fences X and Y such that A is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification order.
Suppose we have a StoreLoad reordering litmus test:
// Thread 1
a.store(1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst); // F1
b.load(std::memory_order_relaxed);
// Thread 2
b.store(1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst); // F2
a.load(std::memory_order_relaxed);
We have two seq-cst fences F1 and F2. My understanding of the paragraph above is that if F1 is ordered before F2 in S, the a.load()
by Thread 2 is guaranteed to observe the value from the a.store()
by Thread 1 (or some later change). If however F2 is ordered before F1, then the b.load()
by Thread 1 is guaranteed to observe the value written by the b.store()
in Thread 2.
However, in an attempt to "Strengthen SC fences" for C++20, P0668R5 proposed to replace paragraphs 3-7 with:
An atomic operation A on some atomic object M is coherence-ordered before another atomic operation B on M if
- A is a modification, and B reads the value stored by A, or
- A precedes B in the modification order of M, or
- A and B are not the same atomic read-modify-write operation, and there exists an atomic modification X of M such that A reads the value stored by X and X precedes B in the modification order of M, or
- there exists an atomic modification X of M such that A is coherence-ordered before X and X is coherence-ordered before B.
There is a single total order S on all
memory_order::seq_cst
operations, including fences, that satisfies the following constraints. First, if A and B arememory_order::seq_cst
operations and A strongly happens before B, then A precedes B in S. Second, for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B, the following four conditions are required to be satisfied by S:
- if A and B are both
memory_order::seq_cst
operations, then A precedes B in S; and- if A is a
memory_order::seq_cst
operation and B happens before amemory_order::seq_cst
fence Y, then A precedes Y in S; and- if a
memory_order::seq_cst
fence X happens before A and B is amemory_order::seq_cst
operation, then X precedes B in S; and- if a
memory_order::seq_cst
fence X happens before A and B happens before amemory_order::seq_cst
fence Y, then X precedes Y in S.[ Note: This definition ensures that S is consistent with the modification order of any atomic object M. It also ensures that a
memory_order::seq_cst
load A of M gets its value either from the last modification of M that precedes A in S or from some non-memory_order::seq_cst
modification of M that does not happen before any modification of M that precedes A in S. -- end note ][ Note: We do not require that S be consistent with "happens before" (6.8.2.1 [intro.races]). This allows more efficient implementation of memory_order::acquire and memory_order::release on some machine architectures. It can produce surprising results when these are mixed with
memory_order::seq_cst
accesses. -- end note ]
But I fail to see how this definition provides the same guarantee as the C++17 paragraph above. Said paragraph essentially defines how seq-cst fences can ensure visibility of atomic operations. The C++20 definition states "(..) for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B (..)", so it requires A and B to be coherence-ordered. But according to the definition of coherence-ordered that requires B to observe the value written by A, i.e., instead of defining how changes can be made visible we are making visibility a precondition. Am I missing something?
Under the C++17 rules, you proved that if F1 precedes F2 in S, then a.load()
in thread 2 observes the value 1 from a.store()
in thread 1, or a value from a modification later in the modification order.
Under the C++20 rules, we can prove the same statement by contrapositive.
Suppose a.load()
in thread 2 (call it A) does not read the value 1 from a.store()
in thread 1 (call it B), nor any later modification. Since A must observe the value of some modification, and the modification order is a total order, it must read the value from some modification X which is earlier than B. Thus by the third bullet in the definition of coherence-order, we have that A is coherence-ordered before B.
Now we can apply the fourth bullet of the requirements on S, taking X = F2 and Y = F1. We have that F2 happens before A and B happens before F1, so we conclude that F2 precedes F1 in S. Since S is a total order, this means that F1 does not precede F2.
A similar argument could prove (again by contrapositive) that if F2 precedes F1 then b.load()
in thread 1 observes the value stored by b.store()
in thread 2.