I'm struggling to understand the ordering that's actually happening to cause this test to occasionally deadlock. I apologize that the example is a bit long, it was hard to reliably reproduce my issue.
Ideally,
inc_thread
increments data
then sets flag
and notifies dec_thread
.
dec_thread
waits for flag
to be set while data
is non-zero then decrements data
and clears flag
.
This repeats REP_COUNT
times for each thread with data
ending as 0
.
If flag.store(false, mem_order);
uses std::memory_order_acq_rel
this will occasionally deadlock, with dec_thread
waiting on flag
while data
is non-zero. Why?
This happens notably even when all other operations are implicitly seq_cst
or explicitly acq_rel
.
#include <atomic>
#include <thread>
#include <cstdint>
void race_test() {
// always works
//constexpr auto mem_order = std::memory_order_seq_cst;
// occasionally deadlocks
constexpr auto mem_order = std::memory_order_acq_rel;
std::atomic<std::size_t> data = 0;
std::atomic_bool flag = false;
constexpr std::size_t REP_COUNT = 2000;
std::thread dec_thread{[&] {
for (std::size_t i = 0; i != REP_COUNT; ++i) {
while (data.load() == 0) {
flag.wait(false);
flag.store(false, mem_order);
}
--data;
}
}};
std::thread inc_thread{[&] {
for (std::size_t i = 0; i != REP_COUNT; ++i) {
++data;
flag.store(true);
flag.notify_one();
}
}};
inc_thread.join();
dec_thread.join();
}
int main()
{
for (std::size_t i = 0; i != 1000; ++i)
race_test();
}
When changing mem_order
to memory_order_seq_cst
everything seems to always work properly.
All I can imagine as being the problem is that flag.store(false, mem_order);
is ordered after data.load()
(and the flag.store(true);
in inc_thread
) but shouldn't that be prevented by acq_rel
?
From what I understand memory_order_seq_cst
only prevents re-ordering with other memory_order_seq_cst
operations, but then shouldn't making everything memory_order_acq_rel
fix this? (it doesn't)
Assuming that changing flag.store(false)
to seq_cst
is even enough to make this correct, could someone explain why? I'd appreciate if someone could state the most "relaxed" order for each atomic operation that still keeps this race-free.
Thank you.
atomic::store
: "If order
is one of std::memory_order_consume
, std::memory_order_acquire
and std::memory_order_acq_rel
, the behavior is undefined."
So, your program has undefined behavior. Questioning why it sometimes deadlocks is therefore pointless. You need to use an order
with defined behavior.