Is this a conforming observable behavior in the abstract machine's sense, where the load reads a value that is not currently produced

Consider this example:

#include <atomic>
#include <iostream>
#include <chrono>
#include <thread>
#include <cassert>

int main(){
  std::atomic<int> val = {0};
  auto t1 = std::thread([&]() {
     auto r = val.load(std::memory_order::relaxed); // #1
     assert(r==1);
  });
  auto t2 = std::thread([&]() {
      std::this_thread::sleep_for(std::chrono::milliseconds(6000)); // #2
      val.store(1,std::memory_order::relaxed); // #4
  });
  t1.join();
  auto now1 = std::chrono::high_resolution_clock::now();
  std::cout<<"t1 completed\n";
  t2.join();
  auto now2 = std::chrono::high_resolution_clock::now();
  std::chrono::duration<double> duration = now2 - now1;
  // duration >=6
}

Is it possible to observe that #1 loads 1 and "t1 completed" is immediately printed(i.e., t1 completes quickly), #4 is executed by t2 six seconds after #2, in terms of the abstract machine's perspective? In other words, #1 reads a value that hasn't been produced yet by t2 at that point.

[intro.races] p10 says:

The value of an atomic object M, as determined by evaluation B, is the value stored by some unspecified side effect A that modifies M, where B does not happen before A.

From the perspective of memory order, #1 and #4 are unordered(in terms of happen-before), so #4 is a possible value read by #1. However, from the perspective of execution, #4 is executed by thread t2 after at least six seconds. That is, #4 hasn't been executed by thread t2 yet at the point when t1 was completed. Although the abstract machine doesn't care about the timeline, it only cares about ordering; however, under the described outcome, #1 reads a value that has not been produced by t2 when t1 is completed. So, I wonder, is the described outcome a conforming observable behavior in the pure abstract machine's perspective?

Solution

This program doesn't check that t1 exits quickly, so the as-if rule allows basically having it both ways: reading the store value from later, but also not actually waiting for it.

One hypothetical mechanism would be value-prediction for the load, only confirmed much later. Real CPUs don't do inter-thread speculation, but the abstract machine doesn't forbid it in most cases, except for the prohibition of out-of-the-blue values. (Which this is not).

Or a machine with a 1 Hz clock so 6000 ms is only a handful of cycles. (Although then you probably couldn't say that t1 exited quickly. That assertion has no formal implications. I guess you could check now() after t1.join() and again after t2.join(), and check that the interval was close to 6 seconds. That wouldn't AFAIK introduce anything that would stop the abstract machine from doing this; now() doesn't sync with anything.)