Can I infer the execution relationship between two evaluations across two threads in this way?

Consider this example:

std::atomic<bool> flag = false;
int arr[2] = {};

// thread 1:
arr[0] = 1; // A
flag.store(true,std::memory_order::relaxed); // B

// thread 2:
while(!flag.load(std::memory_order::relaxed)); // C
arr[1] = 2; // D

According to [intro.execution] p8

Given any two evaluations A and B, if A is sequenced before B (or, equivalently, B is sequenced after A), then the execution of A shall precede the execution of B.

C is sequenced before D, so the execution of C shall precede the execution of D. This implies that, if D is executed, C must read true so that the loop exits and completes its execution.

According to [intro.races] p10

The value of an atomic object M, as determined by evaluation B, is the value stored by some unspecified side effect A that modifies M, where B does not happen before A.

Because C does not happen before B, C can read the value written by B such that the loop exits. This implies that, if the loop exits, B is executed to produce the side effect.

Similarly, according to [intro.execution] p8, the execution of A shall precede the execution of B. This implies that, if B is executed, then A has been executed.

We can get three conclusions as follows:

D is executed -> C exits
C exits -> B is executed
B is executed -> A has been executed

summarize all above, we can conclude that:

D is executed -> A has been executed

Or, conversely, it means

If A hasn't been executed, then D is not executed.

Is this inferred conclusion right?

Update:

By looking at the comments, it seems that people tend to talk about order in this question; however, this question is not about order. The reason why I referenced [intro.execution] p8 is to formally prove why D is unreachable if the loop doesn't exit.

C and D are both executed by the single thread 2, and there are no other threads that would execute C and D, so in thread 2, if the loop doesn't exit, then D won't be executed according to [intro.execution] p8.

The reference to [intro.races] p10 is to formally prove that there must exist a side effect writing true if the loop exits.

The only issue is with using "has been executed". As pointed out by @PeterCordes, if there is no extra clarification, "has been executed" can lead people to tend to interpret it as talking about order, the order in the timeline, or the order in terms of the ISO C++ standard.

Solution

(There's a lot of different ways to try to address this. I went through many edits trying to pick an approach I liked best, but there's still some redundancy after editing down my arguments. Towards the end of the answer, I left in short forms of some of my other attempts at an answer.)

If D executes, you can infer that A executes sometime in the lifetime of the program, but not its order relative to D. They're unordered and potentially concurrent. And their side-effects can become visible to other threads in any order (e.g. if we had std::atomic_int arr[2] so it was possible to observe their relative orders without synchronization from a third thread, or from thread 2 reading arr[0] before or after its store.) It's not correct to say that D implies "A has been executed."

The abstract machine has three possibilities for the relative order of two things being executed: before, concurrent/unordered, and after. In the formalism of the standard, it's not true that if something isn't before, it must be after. If you want more granularity within "concurrent", you have to talk about real implementations, and that's a whole different kettle of fish.

For example, a real implementation can do StoreStore reordering of the non-atomic (or atomic relaxed) A and relaxed B, including at compile time, creating the possibility that on some real machine, it could be possible to say for certain that A has not executed when D has. (e.g. after single-stepping asm of different threads with a debugger.)

But this is after transformations allowed by the as-if rule. Potentially you only intended to argue about the abstract machine. Does execution in the abstract machine have any meaning separate from visibility? Yes for sequencing rules within a thread, but no for the purpose of discussing order between threads. It seems like a meaningless concept to me, just a source of confusion.

There's absolutely no visibility guarantee for the side-effects of A here, due to the lack of a happens-before. C is a relaxed load seeing the value from a relaxed store B. They need to be acquire/release or stronger before we can say that B happens-before C: https://eel.is/c++draft/intro.races#7 - sequenced-before and synchronizes-with, or a transitive combination of the two, are the only ways to get happens-before. So if you mean execution in terms of visibility, the way the standard uses the term, no, you don't get that.

You're stringing together sequenced-before (happens-before), coherence-ordered before (C sees B), and another sequenced before, to actually get zero guarantees between A and D.

You quoted the sequenced-before rule from [intro.execution], which uses "execution" in a single-threaded or sequentially-consistent sense of the word, including guaranteed visibility of all its side-effects to everything sequenced after.
But in the rest of what you wrote, you're using the fact that C happens to see B's side-effect to argue the reverse, that it must have "executed after" in some sense, despite the lack of synchronization.

The most the standard lets us say about the abstract machine is that B is coherence-ordered before C, but that doesn't imply anything else because neither of them are seq_cst. And by the coherence rules (which apply to each object separately), any later reads or writes of flag in thread 2 will come later in the mod order of flag than B and C.

intro.races p7: happens-before is either sequenced-before, synchronizes-with (release/acquire or stronger), or transitive through something else. Since you don't have any of those between A and D, any definition of "execution" which allows you to say A executed before D is confusingly different from the one sequenced-before is using.

Your program uses atomics with weaker ordering than seq_cst, so there doesn't have to be a global order of operations that explains your program's behaviour. There is no global timeline, as @HolyBlackCat says, only timelines within each thread, each separately obeying the as-if rule. Without an inter-thread happens-before to tie them together, you can't say anything about relative orders across threads.

[intro.races]p10: The value of an atomic object M, as determined by evaluation B, is the value stored by some unspecified side effect A that modifies M, where B does not happen before A.

[your reasoning] Because C does not happen before B, C can read the value written by B such that the loop exits. This implies that, if the loop exits, B is executed to produce the side effect.

At some point in the program's execution, B is executed. This point is not necessarily before C in any global timeline. As @HolyBlackCat commented, no such timeline exists in the abstract machine which the standard describes, except in programs which avoid relaxed atomics and thus execute in a sequentially-consistent manner. (Of course assuming no UB, including no data-race UB.)

You are using relaxed atomics, so you don't get to apply any common-sense rules of causality and ordering to the abstract machine, only what the formalism in the standard strictly gives you.

(Reasoning based on causality can be useful in thinking about real machines, but then you'd want to think in terms of allowed reorderings. A and B are allowed to reorder because B is relaxed and A is non-atomic.)

Because C does not happen before B, C can read the value written by B such that the loop exits. This implies that, if the loop exits, B is executed to produce the side effect.

It also doesn't say that C must have executed after B in any sense to see the value. Only that it wasn't required (by a happens-before) to not see it. B and C being unsychronized is sufficient for C being allowed to see it.

If you're thinking that C not happening before B must mean it "happens after" in standardese, then no, it doesn't. Only if B was release or stronger and C was acquire, so they could synchronize-with each other (https://eel.is/c++draft/intro.races#7.2).

In terms of a real implementation, unsynchronized and possibly-concurrent leaves room for things like branch prediction and speculative execution, which allows for execution of later stuff before a load value is actually available to check the prediction that the loop stops looping.

A and D are unordered / unsynchronized, no happens-before between them. If arr was instead a collection like a set or map where assignment to different keys needs to be synchronized, this would be data-race UB. In C++ there's no legal way to observe the order of these assignments to non-atomic array elements.

Possibly your error is interpreting "shall precede" in [intro.execution] to mean "happens before from the PoV of other threads even without synchronization". But [intro.execution] is just talking about within a single thread. You didn't quote the first sentence of that paragraph:

Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread ([intro.multithread]), which induces a partial order among those evaluations.

Sequenced-before is one type of happens-before, but to connect it across threads you need synchronization (acq/rel), not just a relaxed load happening to see a value from a relaxed store.

To be fair, sequenced-before also implies strongly-happens-before, and we have [Note 8: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts. — end note].

It is indeed true that A strongly happens before B, but other threads can only observe that if they sync-with thread 1 somehow. Thread 2 doesn't.