c++multithreadingc++20memory-barriersstdatomic

Confusing orderings in C++ with memory_order_acq_rel vs memory_order_seq_cst


I'm struggling to understand the ordering that's actually happening to cause this test to occasionally deadlock. I apologize that the example is a bit long, it was hard to reliably reproduce my issue.

Ideally, inc_thread increments data then sets flag and notifies dec_thread.

dec_thread waits for flag to be set while data is non-zero then decrements data and clears flag.

This repeats REP_COUNT times for each thread with data ending as 0.

If flag.store(false, mem_order); uses std::memory_order_acq_rel this will occasionally deadlock, with dec_thread waiting on flag while data is non-zero. Why?

This happens notably even when all other operations are implicitly seq_cst or explicitly acq_rel.

#include <atomic>
#include <thread>
#include <cstdint>

void race_test() {

  // always works
  //constexpr auto mem_order = std::memory_order_seq_cst;
    
  // occasionally deadlocks
  constexpr auto mem_order = std::memory_order_acq_rel;

    
  std::atomic<std::size_t> data = 0;
  std::atomic_bool flag = false;

  constexpr std::size_t REP_COUNT = 2000;

  std::thread dec_thread{[&] {
    for (std::size_t i = 0; i != REP_COUNT; ++i) {
      while (data.load() == 0) {
        flag.wait(false);
        flag.store(false, mem_order);
      }
      --data;
    }
  }};

  std::thread inc_thread{[&] {
    for (std::size_t i = 0; i != REP_COUNT; ++i) {
      ++data;
      flag.store(true);
      flag.notify_one();
    }
  }};

  inc_thread.join();
  dec_thread.join();
}

int main()
{
  for (std::size_t i = 0; i != 1000; ++i)
    race_test();
}

When changing mem_order to memory_order_seq_cst everything seems to always work properly.

All I can imagine as being the problem is that flag.store(false, mem_order); is ordered after data.load() (and the flag.store(true); in inc_thread) but shouldn't that be prevented by acq_rel?

From what I understand memory_order_seq_cst only prevents re-ordering with other memory_order_seq_cst operations, but then shouldn't making everything memory_order_acq_rel fix this? (it doesn't)

Assuming that changing flag.store(false) to seq_cst is even enough to make this correct, could someone explain why? I'd appreciate if someone could state the most "relaxed" order for each atomic operation that still keeps this race-free.

Thank you.


Solution

  • atomic::store: "If order is one of std::memory_order_consume, std::memory_order_acquire and std::memory_order_acq_rel, the behavior is undefined."

    So, your program has undefined behavior. Questioning why it sometimes deadlocks is therefore pointless. You need to use an order with defined behavior.