c++gccmemory-barriers

GCC wiki memory barrier example


The following code comes from the GCC Wiki.

// -Thread 1-
y.store (20, memory_order_relaxed)
x.store (10, memory_order_relaxed)

// -Thread 2-
if (x.load (memory_order_relaxed) == 10)
 {
   assert (y.load(memory_order_relaxed) == 20) /* assert A */
   y.store (10, memory_order_relaxed)
 }

// -Thread 3-
if (y.load (memory_order_relaxed) == 10)
 assert (x.load(memory_order_relaxed) == 10) /* assert B */

Since threads don't need to be synchronized across the system, either assert in this example can actually FAIL.

I can figure out why assert A can fail. But how can assert B also fail?

Does y.load() == 10 imply the end of thread 2, thus x.load() == 10?


Solution

  • Might only be possible on a machine that's not multi-copy-atomic (such as POWER) where IRIW reordering is possible. (Will two atomic writes to different locations in different threads always be seen in the same order by other threads?).

    So T2 sees x == 10 before it's globally visible, and stores y=10.

    T3 can then read T2's store of y before the x=10 store is visible to it. (StoreStore reordering from the physical core running T1 and T2 to the phys core running T3).

    This could be possible on real POWER or NVidia ARMv7 hardware if T1 and T2 run on different logical cores of the same physical core, and T3 runs on a separate physical core.


    In terms of the C or C++ memory models, the assert can fail because nothing guarantees visibility. The fact that one thread has seen a value doesn't imply that all threads can see that value.

    There might be other simpler mechanisms too, but the assert in T2 means y.store (10, relaxed) doesn't happen at all if that assert fails so it's not as simple as just x.load running before y.load.