[SOLVED] is std::atomic::fetch_add a serializing operation on x86-64?

is std::atomic::fetch_add a serializing operation on x86-64?

Considering the following code:

std::atomic<int> counter;

/* otherStuff 1 */
counter.fetch_add(1, std::memory_order_relaxed);
/* otherStuff 2 */

Is there an instruction in x86-64 (say less than 5 years old architectures) that would allow otherStuff 1 and 2 be re-ordered across the fetch_add or is it going to be always serializing ?

EDIT:

It looks like this is summarized by "is lock add a memory barrier on x86 ?" and it seems it is not, though I am not sure where to find a reference for that.

Solution

First let's look at what the compiler is allowed to do when using std::memory_order_relaxed.
If there are no dependencies between otherStuff 1/2 and the atomic operation, it can certainly reorder the statements. For example:

g = 3;
a.fetch_add(1, memory_order_relaxed);
g += 12;

clang++ generates the following assembly:

lock   addl $0x1,0x2009f5(%rip)        # 0x601040 <a>
movl   $0xf,0x2009e7(%rip)             # 0x60103c <g>

Here clang took the liberty to reorder g = 3 with the atomic fetch_add operation, which is a legitimate transformation.

When using std::memory_order_seq_cst, the compiler output becomes:

movl   $0x3,0x2009f2(%rip)        # 0x60103c <g>
lock   addl $0x1,0x2009eb(%rip)   # 0x601040 <a>
addl   $0xc,0x2009e0(%rip)        # 0x60103c <g>

Reordering of statements does not take place because the compiler is not allowed to do that. Sequential consistent ordering on a read-modify-write (RMW) operation, is both a release and an acquire operation and as such, no (visible) reordering of statements is allowed on both compiler and CPU level.

Your question is whether, on X86-64, std::atomic::fetch_add, using relaxed ordering, is a serializing operation..
The answer is: yes, if you do not take into account compiler reordering.

On the X86 architecture, an RMW operation always flushes the store buffer and therefore is effectively a serializing and sequentially consistent operation.

You can say that, on an X86 CPU, each RMW operation:

is a release operation for memory operations that precede it and is an acquire operation for memory operations that follow it.
becomes visible in a single total order observed by all threads.