I'm trying to understand memory fences in c++11, I know there are better ways to do this, atomic variables and so on, but wondered if this usage was correct. I realize that this program doesn't do anything useful, I just wanted to make sure that the usage of the fence functions did what I thought they did.
Basically that the release ensures that any changes made in this thread before the fence are visible to other threads after the fence, and that in the second thread that any changes to the variables are visible in the thread immediately after the fence?
Is my understanding correct? Or have I missed the point entirely?
#include <iostream>
#include <atomic>
#include <thread>
int a;
void func1()
{
for(int i = 0; i < 1000000; ++i)
{
a = i;
// Ensure that changes to a to this point are visible to other threads
atomic_thread_fence(std::memory_order_release);
}
}
void func2()
{
for(int i = 0; i < 1000000; ++i)
{
// Ensure that this thread's view of a is up to date
atomic_thread_fence(std::memory_order_acquire);
std::cout << a;
}
}
int main()
{
std::thread t1 (func1);
std::thread t2 (func2);
t1.join(); t2.join();
}
TL;DR:
Your code is not correct for two reasons. 1) Using a fence requires a second atomic operation on each thread in order for the fences actually 'synchronize' with each other and ensure appropriate sequencing. 2) The accesses are in loops and so even if the fences synchronized on some particular iteration, that does not avoid a data race with the next iteration. E.g. Say the fence synchronizes and ensures a = 40
is properly sequenced before cout << a
; you have a data race between that cout << a
and a = 41
.
Full answer:
Your usage of fences does not ensure that your assignments to a
are visible to other threads or that the value you read from a
is 'up to date.' This is because, although you seem to have the basic idea of where fences should be used, your code does not actually meet the exact requirements for those fences to "synchronize".
Here's an example that demonstrates correct usage.
#include <iostream>
#include <atomic>
#include <thread>
std::atomic<bool> flag(false);
int a;
void func1()
{
a = 100;
atomic_thread_fence(std::memory_order_release);
flag.store(true, std::memory_order_relaxed);
}
void func2()
{
while(!flag.load(std::memory_order_relaxed))
;
atomic_thread_fence(std::memory_order_acquire);
std::cout << a << '\n'; // guaranteed to print 100
}
int main()
{
std::thread t1 (func1);
std::thread t2 (func2);
t1.join(); t2.join();
}
(N.B., Thread Sanitizer does not currently support atomic_thread_fence()
and will report a race in this code, but it's a false positive.)
This code adds a flag so that the code can meet specific requirements for the fences to synchronize. From the spec:
A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X – § [atomics.fences]/2
The load and store on the atomic flag do not synchronize, because they both use the relaxed memory ordering. Without the fences the operations on a
would be a data race. With the fences we do get synchronization because we've got all our As, Bs, Ms, Xs, and Ys in order.
This synchronization means anything that happens-before the release fence happens-before anything that happens-after the acquire fence. Therefore the non-atomic write to a
happens-before the non-atomic read of a
.
Things get trickier when you're writing a variable in a loop, because you might establish a happens-before relation for some particular iteration, but not other iterations, causing a data race.
So here's an example using two fences on each thread so that all reads and writes are ordered with respect to both the previous and next iterations of the loop on the other thread:
#include <iostream>
#include <atomic>
#include <thread>
std::atomic<bool> flag(false);
int a;
void func1()
{
for(int i = 0; i < 10; ++i)
{
while (flag.load(std::memory_order_relaxed))
;
atomic_thread_fence(std::memory_order_acquire);
a = i;
atomic_thread_fence(std::memory_order_release);
flag.store(true, std::memory_order_relaxed);
}
}
void func2()
{
for(int i = 0; i < 10; ++i)
{
while(!flag.load(std::memory_order_relaxed))
;
atomic_thread_fence(std::memory_order_acquire);
std::cout << a << '\n';
atomic_thread_fence(std::memory_order_release);
flag.store(false, std::memory_order_relaxed);
}
}
int main()
{
std::thread t1 (func1);
std::thread t2 (func2);
t1.join(); t2.join();
}
Of course because each thread does its thing, flips the flag, and then waits for the other thread to flip it back, there's not really any concurrency, but the fences do ensure all the accesses have well defined sequencing and there are no data races.