c++c++11memoryconcurrency

C++11 Memory model: the compiler is restricted to don't introduce new data races


Watching a CppCon talk by Michel Wong on 2015 (called C++1/14/17 atomics and memory model... about minute 33:00), he said two sentences that I didn't understand:

No compiler transformation is allowed to introduce a data race (more restrictions on invented writes and possibly fewer speculative stores and potentially loads)

What kind of "invented writes" and "speculative stores and loads" did compiler do, which of those were "common" before the C++11 memory model but now are forbidden? Were those important optimizations that are now lost? Or they had a tiny impact anyway?

There are atomic memory operations that don't cause races (or they race but are well-defined)

What does he mean here by "there are"? In the hardware or does he is refering to some C++ built-in operations? Which memory operations can "cause races but are well-defined"?


Solution

  • There's an example in N2338. The original code is

    switch (y) {
        case 0: x = 17; w = 1; break;
        case 1: x = 17; w = 3; break;
        case 2: w = 9; break;
        case 3: x = 17; w = 1; break;
        case 4: x = 17; w = 3; break;
        case 5: x = 17; w = 9; break;
        default: x = 17; w = 42; break;
    }
    

    and a compiler might potentially want to optimize it to:

    tmp = x; x = 17;
    switch (y) {
        case 0: w = 1; break;
        case 1: w = 3; break;
        case 2: x = tmp; w = 9; break;
        case 3: w = 1; break;
        case 4: w = 3; break;
        case 5: w = 9; break;
        default: w = 42; break;
    }
    

    This transformation reduces the number of instructions (code size). It can be thought of as speculatively setting x to 17 (because that's what happens most of the time) and then undoing the effect of that store if the branch is taken along which x is not supposed to be modified. In single-threaded code, this transformation is a valid optimization that does not change the behaviour of the program.

    However, suppose that in cases where y is equal to 2, this block of code may execute concurrently with something happening in another thread that reads the value of x. This block of code is not supposed to write to x in this case, so it's fine if the other thread reads from x. If the compiler transforms this block of code so that it performs a speculative write to x, it means that that write now may be concurrent with the read in another thread, which causes undefined behaviour (assuming x is not an atomic variable). Thus, the transformation potentially introduces undefined behaviour into a program that might not have had undefined behaviour originally.

    I don't know how much impact these optimizations ever had in the first place; I suspect not much, because if it were actually significant, then GCC probably would have a flag that lets you promise your code is single-threaded in exchange for getting those optimizations back... but as far as I can tell, it doesn't. (Although, interestingly, libstdc++ does do some optimizations at the library level; see here)