c++concurrencymemory-barrierslock-free

Unexpected inter-thread happens-before relationships from relaxed memory ordering


I'm working my way through C++ concurrency in action and ran into a problem trying to understand listing 5.12, reproduced below (GitHub code sample). I understand why the following should work when the release and acquire memory fences are in.

#include <assert.h>

std::atomic<bool> x,y;
std::atomic<int> z;

void write_x_then_y()
{
    x.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    y.store(true,std::memory_order_relaxed);
}

void read_y_then_x()
{
    while(!y.load(std::memory_order_relaxed));
    std::atomic_thread_fence(std::memory_order_acquire);
    if(x.load(std::memory_order_relaxed))
        ++z;
}

int main()
{
    x=false;
    y=false;
    z=0;
    std::thread a(write_x_then_y);
    std::thread b(read_y_then_x);
    a.join();
    b.join();
    assert(z.load()!=0);
}

However if I remove the fences, this example unexpectedly still works and z == 1. See below for my modified example:

#include <atomic>
#include <cassert>
#include <thread>

std::atomic_int x(0);
std::atomic_int y(0);
std::atomic_int z(0);

void read_y_then_x() {
  while (!(y.load(std::memory_order_relaxed) == 1))
    ;
//   std::atomic_thread_fence(std::memory_order_acquire);
  if (x.load(std::memory_order_relaxed) == 1) {
    z.fetch_add(1, std::memory_order_relaxed);
  }
}

void write_x_then_y() {
  x.store(1, std::memory_order_relaxed);
//   std::atomic_thread_fence(std::memory_order_release);
  y.store(1, std::memory_order_relaxed);
}

int main() {
  for (int i = 0; i < 100'000; i++) {
    z.store(0);
    x.store(0);
    y.store(0);
    std::thread t2(read_y_then_x);
    std::thread t1(write_x_then_y);
    t2.join();
    t1.join();
    assert(z.load(std::memory_order_relaxed) == 1);
  }
}

Is there a memory ordering constraint I'm missing here? Is there a release sequence getting formed that I am unaware of?

I'm running this on an M1 mac and compiling with g++, I don't think that matters here, though.


Solution

  • There is nothing you are missing. The assert can fail.

    In particular, the compiler or CPU are permitted to do the store to y before the one to x in write_x_then_y, because the order of the two statements doesn't imply any additional happens-before constraint.

    Then the other thread may do both loads in-between the two stores, so that the if condition will evaluate to false.

    That you are not actually seeing this happen doesn't change that it is a permitted output and you have no guarantee that it won't behave differently after another compilation or even if you run the same binary often enough. It may happen only every one in a million or one in a billion runs or anything else.