Can the hardware reorder an atomic load followed by an atomic store, if the store is conditional on the load? It would be highly unintuitive if this could happen, because if thread1 speculatively due to branch prediction, or for whatever reason, writes y = 1
before the condition in the if statement is confirmed as true
, then if it later finds out the condition resolves to false
, there is no way to undo the damage (roll back the changes), because another thread may have already read the updated value of y
!
I don't think release and acquire semantics will help for this case. An std::memory_order_release
on y.store()
would be used to make earlier writes visible to other threads that saw the value of y
, but in this case there is nothing that thread1 can do to affect the visibility of x
on other threads, because x
was not written by thread1. The examples I've seen in the C++ documentation for acquire and release semantics only talk about a pair of threads, where one is purely a writer and the other is purely a reader.
Example:
std::atomic<int> x;
std::atomic<int> y;
void thread1() {
int val = x.load(std::memory_order_relaxed);
if( val == 42 ){
y.store(1, std::memory_order_relaxed);
}
}
EDIT: I've been asked to provide examples including other threads. One is about ordering of stores as perceived by observers, whilst the other is about the possibility of speculative execution of stores.
After making these examples and thinking about the problem some more, I can further clarify the question:
// This example is about ordering.
// Will thread3 see the new value of x, if thread1 has seen it?
std::atomic<int> x;
std::atomic<int> y;
void thread1() {
int val = x.load(std::memory_order_relaxed);
if( val == 42 ){
y.store(1, std::memory_order_relaxed);
}
}
void thread2() {
x.store(42, std::memory_order_relaxed);
}
void thread3() {
if( y.load(std::memory_order_relaxed) == 1 ){
std::atomic_thread_fence(memory_order_acquire);
int r = x.load(std::memory_order_relaxed);
assert(r == 42); // Can this go wrong?
}
}
// Similar, but we're putting it inside a loop
// to make it a little more interesting.
// This focuses on speculative execution.
std::atomic<int> x;
std::atomic<int> y;
std::atomic<bool> terminate;
void thread1() {
while( !terminate.load(std::memory_order_relaxed) ){
int val = x.load(std::memory_order_relaxed);
if( val == 42 ){
// Can y.store() be done speculatively?
y.store(1, std::memory_order_relaxed);
break;
}
}
}
void thread2() {
// This will count down and write to x,
// but should never actually store the value 42.
for(int idx = 10000000; idx > 42; --idx){
x.store(idx, std::memory_order_relaxed);
}
sleep_seconds(10);
terminate = true;
}
void thread3() {
while( !terminate.load(std::memory_order_relaxed) ){
if( y.load(std::memory_order_relaxed) == 1 ){
// This should never happen!
Open_Pandoras_Box();
Launch_Nuclear_Missiles();
Unleash_Armageddon();
break;
}
}
}
As you mentioned in your comment, let's suppose we have two other threads:
std::atomic<int> x;
std::atomic<int> y;
int xx1, xx3, yy;
void thread1() {
xx1 = x.load(std::memory_order_relaxed);
if( xx1 == 42 ){
y.store(1, std::memory_order_relaxed);
}
}
void thread2() {
x.store(42, std::memory_order_relaxed);
}
void thread3() {
while (y.load(std::memory_order_acquire) != 1) { }
xx3 = x.load(std::memory_order_acquire);
assert(xx3 == 42);
}
ISO C++ allows the assert to fire.
By default in the C++ memory model, an atomic load may return any value that is stored to that object at any point in the program's execution, unless the coherence rules forbid it. So the only way to prove that the load into xx3
(thread3) cannot return 0
would be to prove that the store of 42 to x
in thread2 happens-before the thread3 load. But this cannot be done: there are no release stores in the entire program, nor any other way to establish synchronization, and so it is not possible to prove a happens-before relationship between any two operations from separate threads.
On the other hand, it's still true that if
means what you think it does. If the controlling condition is false, the controlled statement does not execute. If that statement was an atomic store, then the value that was to be stored must not be loaded from that object at any point in the program (unless of course the same value is stored elsewhere), because the store never happens. So if the assert does fire then xx1
must contain the value 42
(if read far enough in the future without a data race, e.g. after joining thread1). The outcome xx1 == 0 && assert_fires
is forbidden, as it's logically impossible.
But although there is a control relationship between the load and the store in thread1, which must be respected, it doesn't have to respect time; indeed, there's no such concept as time in the C++ memory model (other than a couple of points related to steady clocks). So nothing forbids the load of y
to return 1 at an earlier instant in time than when x.store(42)
executes.
If we turn from abstract memory models to real CPUs with physical memory attached, operating in normal space-time, I agree it's difficult to imagine a practical implementation that could behave in this way. Core 1 is theoretically allowed to "publish" its store of 1 to y
before having completed the load from x
, but only if it can be absolutely certain, through some sort of prescience, that said load will inevitably return 42
. (For instance, perhaps it proved this via static analysis of the program.) Or, in the alternative, it can publish the store speculatively, without knowing what value will be loaded from x
, if it has a magical mechanism to "roll back" that store and all its effects, including those which may have taken place on other cores, if it turns out to have guessed wrong.
Of course, a normal real-life machine does not have any way to know for certain what value will be loaded from x
, until that load actually completes. And, as you said, a normal real-life machine does not have a way to roll back work done by other cores (which could even include externally visible I/O). As such, a normal real-life machine cannot make the store to y
globally visible until it is non-speculative, which can only occur after the load of x
completes and its value is known.