c++multithreadingc++11atomiccpu-cores

Does C++11 atomic automatically solve multi-core race on variable read-write?


I know that atomic<T> will apply a lock on type "T" variable when multiple threads are reading and writing the variable, making sure only one of them is doing the R/W.

But in a multi-CPU core computer, threads can run on different cores, and different cores would have different L1-cache, L2-cache, while share L3-cache. We know sometimes C++ compiler will optimize a variable to be stored inside register, so that if a variable is not stored in memory, then there's no memory synchronization between different core-cache on the variable.

If an atomic<T> variable is optimized to be some register variable by compiler, then it's not stored in memory, when one core writes its value, another core could read out a stale value, right? Is there any guarantee on this data consistency?


Solution

  • Atomic doesn't "solve" things the way you vaguely describe. It provides certain very specific guarantees onvolving consistency of memory based on order.

    Various compilers implement these guarantees in different ways on different platforms.

    On x86/64 no locks are used for atomic integers and pointers up to a reasonable size. And the hardware provises stronger guarantees than the standard requires, making some of the more esoteric options equivalent to full consistency.

    I won't be able to fully answer your question but I can point you in the right direction; the topic you need to learn about is "the C++ memory model".

    That being said, atomics exist in order to avoid the exact problem you describe. If you ask for full memory order consistency, and thread A modifies X then Y, no other thread can see Y modified but not X. How that guarantee is provided is not specified by the C++ standard; cache line invalidation, using special instructions for access, barring certain register-based optimizations by the compiler, etc are all the kind of thing that compilers do.

    Note that the C++ memory model was refined, bugfixed and polished for C++17 in order to describe the behaviour of the new parallel algorithms and permit their efficient implementation on GPU hardware (among other spots) with the right flags, and in turn it influenced the guarantees that new GPU hardware provides. So people talking about memory models may be excited and talk about more modern issues than your mainly C++11 concerns.

    This is a big complex topic. It is really easy to write code you think is portable, yet only works on a specific platform, or only usually works on the platform you tested it on. But that is just because threading is hard.