c++language-lawyercompiler-optimizationmemory-modelstdatomic

Can atomic loads be merged in the C++ memory model?


Consider the C++ 11 snippet below. For GCC and clang this compiles to two (sequentially consistent) loads of foo. (Editor's note: compilers do not optimize atomics, see this Q&A for more details, especially http://wg21.link/n4455 standards discussion about the problems this could create which the standard doesn't give programmers tools to work around. This language-lawyer Q&A is about the current standard, not what compilers do.)

Does the C++ memory model allow the compiler to merge these two loads into a single load and use the same value for x and y?

(Editor's note: this is something the standards group is working on: http://wg21.link/n4455 and http://wg21.link/p0062. The current standard on paper allows behaviours that are undesirable.)


I think it cannot merge these loads, because that means that polling an atomic doesn't work anymore, but I cannot find the relevant part in the memory model documentation.

#include <atomic>
#include <cstdio>

std::atomic<int> foo;

int main(int argc, char **argv)
{
    int x = foo;
    int y = foo;

    printf("%d %d\n", x, y);
    return 0;
}

Solution

  • Yes, because we can not observe the difference!

    An implementation is allowed to turn your snippet into the following (pseudo-implementation).

    int __loaded_foo = foo;
    
    int x = __loaded_foo;
    int y = __loaded_foo;
    

    The reason is that there is no way for you to observe the difference between the above, and two separate loads of foo given the guarantees of sequential-consistency.

    Note: It is not just the compiler that can make such an optimization, the processor can simply reason that there is no way in which you can observe the difference and load the value of foo once — even though the compiler might have asked it to do it twice.


    Explanation

    Given a thread that keeps on updating foo in an incremental fashion, what you are guaranteed is that y will have either the same, or a later written value, when compared to the contents of x.

    // thread 1 - The Writer
    while (true) {
      foo += 1;
    }
    
    // thread 2 - The Reader
    while (true) {
      int x = foo;
      int y = foo;
    
      assert (y >= x); // will never fire, unless UB (foo has reached max value)
    }                  
    

    Imagine the writing thread for some reason pauses its execution on every iteration (because of a context-switch or other implementation defined reason); there is no way in which you can prove that this is what is causing both x and y to have the same value, or if it is because of a "merge optimization".


    In other words, we have two potential outcomes given the code in this section:

    1. No new value is written to foo between the two reads (x == y).
    2. A new value is written to foo between the two reads (x < y).

    Since any of the two can happen, an implementation is free to narrow down the scope to simply always execute one of them; we can in no way observe the difference.


    What does the Standard say?

    An implementation can make whatever changes it wants as long as we cannot observe any difference between the behavior which we expressed, and the behavior during execution.

    This is covered in [intro.execution]p1:

    The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

    Another section which makes it even more clear [intro.execution]p5:

    A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.

    Further Reading


    What about polling in a loop?

    // initial state
    std::atomic<int> foo = 0;
    
    // thread 1
    while (true) {
      if (foo)
        break;
    }
    
    // thread 2
    foo = 1
    

    Question: Given the reasoning in the previous sections, could an implementation simply read foo once in thread 1, and then never break out of the loop even if thread 2 writes to foo?

    The answer; No.

    In a sequentially-consistent environment we are guaranteed that a write to foo in thread 2 will become visible in thread 1; this means that when that write has happened, thread 1 must observe this change of state.

    Note: An implementation can turn two reads into a single one because we cannot observe the difference (one fence is just as effective as two), but it cannot completely disregard a read that exists by itself.

    Note: The contents of this section is guaranteed by [atomics.order]p3-4.


    What if I really want to prevent this form of "optimization"?

    If you would like to force the implementation to actually read the value of some variable at every point where you have written it you should look into usage of volatile (note that this in no way enhances thread-safety).

    But in practice compilers don't optimize atomics, and the standards group has recommended against using volatile atomic for this kind of reason until the dust settles on this issue. See