c++multithreadingoptimizationclangdata-race

Why does clang optimize out a loop polling a variable that another thread writes to?


While I was studying C++, I found something weird...
I though that the below code would produce the result of big number(At least not 1.1).
Instead the result was enter image description here

Other compilers worked as expected.
But the clang compiler with aggressive optimization seem to ignore the while loop.
So my question is, what's the problem with my code? Or is this intended by the clang?

I used the apple clang compiler(v14.0.3)

#include <iostream>
#include <thread>


static bool should_terminate = false;

void infinite_loop() {
    long double i = 1.1;
    while(!should_terminate)
        i *= i;
    std::cout << i;
}

int main() {
    std::thread(infinite_loop).detach();
    std::cout << "main thread";
    for (int i = 0 ; i < 5; i++) {
        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::cout << ".";
    }
    should_terminate = true;
}

Assembly result from compiler explorer(clang v16.0.0, -O3)
This also seemed to skip the while loop.

_Z13infinite_loopv:                     # @_Z13infinite_loopv
        sub     rsp, 24
        fld     qword ptr [rip + .LCPI0_0]
        fstp    tbyte ptr [rsp]
        mov     rdi, qword ptr [rip + _ZSt4cout@GOTPCREL]
        call    _ZNSo9_M_insertIeEERSoT_@PLT
        add     rsp, 24
        ret

Solution

  • Your code has undefined behaviour:

    should_terminate is not an atomic object, so writing to it in one thread and accessing it in another thread potentially concurrently (i.e. without any synchronization) is a data race, which is always undefined behaviour.

    Practically speaking this UB rule permits the compiler to make exactly the optimization you see here.

    The compiler can assume that should_terminate will never change in the loop, because it cannot possibly be written to from another thread since that would be a data race. So when reaching the loop it is either false and stays false, so that the loop never terminates, or it is true, in which case the loop body doesn't execute at all.

    Then, because an infinite loop that doesn't perform any atomic/IO/volatile/synchronization operation would also have UB, the compiler can further deduce that should_terminate must be (always) true when the loop is reached. Consequently the loop body can never be executed and removing the loop is a permitted optimization.

    So Clang is behaving correctly here and your expectations are wrong. should_terminate must be a std::atomic<bool> (or std::atomic_flag) so that writing to it unsynchronized with other access it is not a data race.