How to verify a possible execution is OOTA?

Consider this example:

#include <thread>
#include <atomic>
int main(){
  std::atomic<int> x = 0, y = 0;
  auto t1 = std::thread([&](){
     if(x.load(std::memory_order::relaxed)==42){ // #1
        y.store(42,std::memory_order::relaxed); // #2
     }
  });
  auto t2 = std::thread([&](){
    while(y.load(std::memory_order::relaxed)==0); // #3
    x.store(42,std::memory_order::relaxed);  // #4
  });
  t1.join();
  t2.join();
}

This is a typical OOTA case that is a variant of the example in [atomics.order] p9. Because the computation of #1 circularly depends on itself.

Implementations should ensure that no “out-of-thin-air” values are computed that circularly depend on their own computation.

However, if changing t1 to the following:

  auto t1 = std::thread([&](){
     if(x.load(std::memory_order::relaxed)==42){ // #1
       y.store(42,std::memory_order::relaxed); // #2
     }else{
       y.store(42,std::memory_order::relaxed);
     }
  });

Or, the functionally equivalent form of the above

  auto t1 = std::thread([&](){
     x.load(std::memory_order::relaxed) // #1
     y.store(42,std::memory_order::relaxed); // #2
  });

Is the changed example OOTA? Intuitively, I think the subsequent changed examples are not OOTA; however, I cannot explain why. Even though there is no formal definition for what OOTA is, is there an informal way to explain their difference and why the first one is OOTA but the other are not?

I think we cannot distinguish/explain them from the perspective of the control flow. #2 depends on whether the control flow is returned from #1, in all three cases. In other words, #2 is unreachable if the execution of #1 is not completed(from the perspective of the abstract machine sense); For instance, #2 wouldn't be executed if #1 could throw an exception.

So, how to explain them?

Solution

This is, as the comments mention, an area of active research. I’ll recommend particularly §2.2 of How to Avoid OOTA Without Really Trying, which discusses the difficulty in defining what it means for one value to depend on another even in single-threaded code (which is obviously a necessary component of making any multi-threaded definition rigorous).

Here we have the additional complication of the unbounded set of operations in t2: our intuition, based on real implementations, is that a compiler might reorder a relaxed atomic operation past any fixed number of other operations but cannot move one past a loop that might never terminate. (This is similar to the idea that the order of global initialization is likely fixed for any one binary: a reasonable implementation has little reason to make that order non-deterministic even if it depends on the details of linking.) However, the wording one might use to specify that would either have to enumerate permissible transformations (which is a mire) or rely on counterfactuals to assign responsibility for an operation (which is also a mire since there’s no useful topology with which to define perturbed programs).

Nor are relaxed operations any stranger to non-causality: the current specification allows them to produce a pointer to memory that has not yet been allocated. The intent is for that example to simply have undefined behavior, so it is merely a sort of converse to the OOTA situations for which we want to constrain the implementation rather than the program. Having a cycle in the union of sequenced-before and writes-observed-value isn’t problematic in general; it’s “must be sequenced before” that we want, and that’s intuitively easy for these simple examples (“storing this value for y necessitates that this value was read for x, which I think is the answer to your stated question) but has resisted all attempts at providing an actual definition.