c++atomicstdatomic

busy waiting with atomic<bool>


Can compiler hoist out checks to a atomic variable in a tight infinite loop?

std::atomic<bool> ready = false;
void run() {
    while(!ready){}
    // ... do something
}

Does the compiler have the freedom to rearrange the above code to this

std::atomic<bool> ready = false;
void run() {
    bool is_ready = ready;
    while(!is_ready){}
    // ... do something
}

https://www.youtube.com/watch?v=F6Ipn7gCOsY&ab_channel=CppCon at 17:24, Arthur O'Dwyer says compiler can rearrange. It seems quite counter intuitive. What if I change the call to ready.load(), would it prevent the compiler from rearranging the code? If not, then how come CAS operations work with in a while loop. Can't compiler rearrange those in a similar fashion?


Solution

  • I had a look at the video for context. Arthur got this wrong, this transformation is not valid.

    Compilers have to assume that other threads are changing the values of std::atomic objects between reads, because as he says, concurrent read + write is not data race UB. (For non-atomic variables, data race UB is what allows this transformation which compilers do in practice.)

    [intro.progress] p18:

    An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

    [atomics.order] p11:

    Recommended practice: The implementation should make atomic stores visible to atomic loads, and atomic loads should observe atomic stores, within a reasonable amount of time.

    These are only "should" requirements in the ISO C++ standard, but this transformation would make an implementation useless in practice.

    The reason those requirements aren't stronger than "should" is to allow running multi-threaded programs on a heavily-loaded OS with a scheduler that doesn't guarantee fairness, e.g. that can starve a thread indefinitely if there are other high-priority (or real-time priority) threads taking all the CPU time. Not because the authors of the standard think it's reasonable for a compiler to statically decide that a loop should never see a store if it wasn't visible before the first iteration.


    See also Why set the stop flag using `memory_order_seq_cst`, if you check it with `memory_order_relaxed`? (no reason, that was bad advice from Herb Sutter's talk.)

    In your case, spin-waiting before reading a value, you'd want .load(std::memory_order_acquire), but the compiler still couldn't transform to an infinite loop for .load(relaxed).

    The optimization would only be valid in a single-threaded program, which this isn't. Or a broken cooperative-multitasking implementation on a single-core machine. I say "broken" because it would violate the "should" recommendations in the standard that stores be visible to loads in finite time. Without preemptive multitasking, the compiler would need to invent yield calls to make code like this work, if it wants to support code that wasn't written for cooperative multitasking.

    (Infinite loops without I/O, a volatile access, or an atomic or synchronization operation are UB. But this loop does do atomic ops. And is also not infinite as long as another thread actually does set the flag at some point.)