I don't understand, why will be problems without release sequence, if we have 2 threads in the example below. We have only 2 operations on the atomic variable count. count is decremented sequently as shown in the output.
From C++ Concurrency in Action by Antony Williams:
I mentioned that you could get a
synchronizes-with relationshipbetween astoreto an atomic variable and aloadof that atomic variable from another thread, even when there’s a sequence ofread-modify-writeoperations between thestoreand theload, provided all the operations are suitably tagged. If the store is tagged withmemory_order_release,memory_order_acq_rel, ormemory_order_seq_cst, and the load is tagged withmemory_order_consume,memory_order_acquire, ormemory_order_seq_cst, and each operation in the chain loads the value written by the previous operation, then the chain of operations constitutes a release sequence and the initial storesynchronizes-with(formemory_order_acquireormemory_order_seq_cst) or isdependency-ordered-before(formemory_order_consume) the final load. Any atomic read-modify-write operations in the chain can have any memory ordering (evenmemory_order_relaxed).To see what this means (release sequence) and why it’s important, consider an
atomic<int>being used as a count of the number of items in a shared queue, as in the following listing.One way to handle things would be to have the thread that’s producingthe data store the items in a shared buffer and then do
count.store(number_of_items, memory_order_release)#1 to let the other threads know that data is available. The threads consuming the queue items might then docount.fetch_sub(1,memory_ order_acquire)#2 to claim an item from the queue, prior to actually reading the shared buffer #4. Once the count becomes zero, there are no more items, and the thread must wait #3.
#include <atomic>
#include <thread>
#include <vector>
#include <iostream>
#include <mutex>
std::vector<int> queue_data;
std::atomic<int> count;
std::mutex m;
void process(int i)
{
std::lock_guard<std::mutex> lock(m);
std::cout << "id " << std::this_thread::get_id() << ": " << i << std::endl;
}
void populate_queue()
{
unsigned const number_of_items = 20;
queue_data.clear();
for (unsigned i = 0;i<number_of_items;++i)
{
queue_data.push_back(i);
}
count.store(number_of_items, std::memory_order_release); //#1 The initial store
}
void consume_queue_items()
{
while (true)
{
int item_index;
if ((item_index = count.fetch_sub(1, std::memory_order_acquire)) <= 0) //#2 An RMW operation
{
std::this_thread::sleep_for(std::chrono::milliseconds(500)); //#3
continue;
}
process(queue_data[item_index - 1]); //#4 Reading queue_data is safe
}
}
int main()
{
std::thread a(populate_queue);
std::thread b(consume_queue_items);
std::thread c(consume_queue_items);
a.join();
b.join();
c.join();
}
output (VS2015):
id 6836: 19
id 6836: 18
id 6836: 17
id 6836: 16
id 6836: 14
id 6836: 13
id 6836: 12
id 6836: 11
id 6836: 10
id 6836: 9
id 6836: 8
id 13740: 15
id 13740: 6
id 13740: 5
id 13740: 4
id 13740: 3
id 13740: 2
id 13740: 1
id 13740: 0
id 6836: 7
If there’s one consumer thread, this is fine; the
fetch_sub()is a read, withmemory_order_acquiresemantics, and the store hadmemory_order_releasesemantics, so the store synchronizes-with the load and the thread can read the item from the buffer.If there are two threads reading, the second
fetch_sub()will see the value written by the first and not the value written by the store. Without the rule about therelease sequence, this second thread wouldn’t have ahappens-before relationshipwith the first thread, and it wouldn’t be safe to read the shared buffer unless the firstfetch_sub()also hadmemory_order_releasesemantics, which would introduce unnecessary synchronization between the two consumer threads. Without therelease sequencerule ormemory_order_releaseon thefetch_suboperations, there would be nothing to require that the stores to thequeue_datawere visible to the second consumer, and you would have a data race.
What does he mean? That both threads should see the value of count is 20? But in my output count is sequently decremented in threads.
Thankfully, the first
fetch_sub()does participate in the release sequence, and so thestore()synchronizes-with the secondfetch_sub(). There’s still no synchronizes-with relationship between the two consumer threads. This is shown in figure 5.7. The dotted lines in figure 5.7 show the release sequence, and the solid lines show thehappens-before relationships
What does he mean? That both threads should see the value of count is 20? But in my output count is sequently decremented in threads.
No he doesn't. All modification to count are atomic, so both reader threads would always see different values for it in the given code.
He's talking about the implications of the release sequence rule, namely that when a given thread performs a release store, other multiple threads that then perform acquire loads of the same location form a release sequence, in which each subsequent acquire load has a happens-before relationship with the storing thread (i.e. the completion of the store happens-before the load). This means that the load operation in the reader thread is a synchronisation point with the writer thread, and all memory operations in the writer prior to the store must complete and be visible in the reader when its corresponding load completes.
He's saying that without this rule, only the first thread would be thus synchronised to the writer. The second thread would therefore have a data race in accessing queue (note: not count, which is protected anyway by atomic access). Theoretically, memory operations on data occurring before the store on count could be seen by reader thread number 2 only after its own load operation on count. The release sequence rule assures that this will not happen.
In summary: the release sequence rules assures multiple threads can synchronise their loads on a single store. The synchronisation in question is that of memory accesses to data other than the actual atomic variable being synchronised on (which is guaranteed to be synchronised anyway due to being atomic).
Note to add here: for the most part these kind of issues are only of concern on CPU architectures that are relaxed about reordering their memory operations. The Intel architecture is not one of them: it is strongly-ordered and has only a few very specific circumstances in which memory operations can ever be reordered. These kind of nuances are mostly only relevant when talking about other architectures, such as ARM and PowerPC.