c++language-lawyermemory-barriersstdatomicmemory-model

Visibility of atomic operations with seq-cst fences in C++20


Until C++17 the standard contained the following paragraph (C++17 Section 32.4 [atomics.order] paragraph 6):

For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there are memory_order::seq_cst fences X and Y such that A is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification order.

Suppose we have a StoreLoad reordering litmus test:

// Thread 1
a.store(1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst); // F1
b.load(std::memory_order_relaxed);

// Thread 2
b.store(1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst); // F2
a.load(std::memory_order_relaxed);

We have two seq-cst fences F1 and F2. My understanding of the paragraph above is that if F1 is ordered before F2 in S, the a.load() by Thread 2 is guaranteed to observe the value from the a.store() by Thread 1 (or some later change). If however F2 is ordered before F1, then the b.load() by Thread 1 is guaranteed to observe the value written by the b.store() in Thread 2.

However, in an attempt to "Strengthen SC fences" for C++20, P0668R5 proposed to replace paragraphs 3-7 with:

An atomic operation A on some atomic object M is coherence-ordered before another atomic operation B on M if

  • A is a modification, and B reads the value stored by A, or
  • A precedes B in the modification order of M, or
  • A and B are not the same atomic read-modify-write operation, and there exists an atomic modification X of M such that A reads the value stored by X and X precedes B in the modification order of M, or
  • there exists an atomic modification X of M such that A is coherence-ordered before X and X is coherence-ordered before B.

There is a single total order S on all memory_order::seq_cst operations, including fences, that satisfies the following constraints. First, if A and B are memory_order::seq_cst operations and A strongly happens before B, then A precedes B in S. Second, for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B, the following four conditions are required to be satisfied by S:

  • if A and B are both memory_order::seq_cst operations, then A precedes B in S; and
  • if A is a memory_order::seq_cst operation and B happens before a memory_order::seq_cst fence Y, then A precedes Y in S; and
  • if a memory_order::seq_cst fence X happens before A and B is a memory_order::seq_cst operation, then X precedes B in S; and
  • if a memory_order::seq_cst fence X happens before A and B happens before a memory_order::seq_cst fence Y, then X precedes Y in S.

[ Note: This definition ensures that S is consistent with the modification order of any atomic object M. It also ensures that a memory_order::seq_cst load A of M gets its value either from the last modification of M that precedes A in S or from some non-memory_order::seq_cst modification of M that does not happen before any modification of M that precedes A in S. -- end note ]

[ Note: We do not require that S be consistent with "happens before" (6.8.2.1 [intro.races]). This allows more efficient implementation of memory_order::acquire and memory_order::release on some machine architectures. It can produce surprising results when these are mixed with memory_order::seq_cst accesses. -- end note ]

But I fail to see how this definition provides the same guarantee as the C++17 paragraph above. Said paragraph essentially defines how seq-cst fences can ensure visibility of atomic operations. The C++20 definition states "(..) for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B (..)", so it requires A and B to be coherence-ordered. But according to the definition of coherence-ordered that requires B to observe the value written by A, i.e., instead of defining how changes can be made visible we are making visibility a precondition. Am I missing something?


Solution

  • Under the C++17 rules, you proved that if F1 precedes F2 in S, then a.load() in thread 2 observes the value 1 from a.store() in thread 1, or a value from a modification later in the modification order.

    Under the C++20 rules, we can prove the same statement by contrapositive.

    Suppose a.load() in thread 2 (call it A) does not read the value 1 from a.store() in thread 1 (call it B), nor any later modification. Since A must observe the value of some modification, and the modification order is a total order, it must read the value from some modification X which is earlier than B. Thus by the third bullet in the definition of coherence-order, we have that A is coherence-ordered before B.

    Now we can apply the fourth bullet of the requirements on S, taking X = F2 and Y = F1. We have that F2 happens before A and B happens before F1, so we conclude that F2 precedes F1 in S. Since S is a total order, this means that F1 does not precede F2.

    A similar argument could prove (again by contrapositive) that if F2 precedes F1 then b.load() in thread 1 observes the value stored by b.store() in thread 2.