c++x86atomicstdatomicrelaxed-atomics

is std::atomic::fetch_add a serializing operation on x86-64?


Considering the following code:

std::atomic<int> counter;

/* otherStuff 1 */
counter.fetch_add(1, std::memory_order_relaxed);
/* otherStuff 2 */

Is there an instruction in x86-64 (say less than 5 years old architectures) that would allow otherStuff 1 and 2 be re-ordered across the fetch_add or is it going to be always serializing ?

EDIT:

It looks like this is summarized by "is lock add a memory barrier on x86 ?" and it seems it is not, though I am not sure where to find a reference for that.


Solution

  • First let's look at what the compiler is allowed to do when using std::memory_order_relaxed.
    If there are no dependencies between otherStuff 1/2 and the atomic operation, it can certainly reorder the statements. For example:

    g = 3;
    a.fetch_add(1, memory_order_relaxed);
    g += 12;
    

    clang++ generates the following assembly:

    lock   addl $0x1,0x2009f5(%rip)        # 0x601040 <a>
    movl   $0xf,0x2009e7(%rip)             # 0x60103c <g>
    

    Here clang took the liberty to reorder g = 3 with the atomic fetch_add operation, which is a legitimate transformation.

    When using std::memory_order_seq_cst, the compiler output becomes:

    movl   $0x3,0x2009f2(%rip)        # 0x60103c <g>
    lock   addl $0x1,0x2009eb(%rip)   # 0x601040 <a>
    addl   $0xc,0x2009e0(%rip)        # 0x60103c <g>
    

    Reordering of statements does not take place because the compiler is not allowed to do that. Sequential consistent ordering on a read-modify-write (RMW) operation, is both a release and an acquire operation and as such, no (visible) reordering of statements is allowed on both compiler and CPU level.

    Your question is whether, on X86-64, std::atomic::fetch_add, using relaxed ordering, is a serializing operation..
    The answer is: yes, if you do not take into account compiler reordering.

    On the X86 architecture, an RMW operation always flushes the store buffer and therefore is effectively a serializing and sequentially consistent operation.

    You can say that, on an X86 CPU, each RMW operation: