c++visual-c++x86atomicrelaxed-atomics

Why does memory_order_relaxed use atomic (lock-prefixed) instructions on x86?


On Visual C++ 2013, when I compile the following code

#include <atomic>

int main()
{
    std::atomic<int> v(2);
    return v.fetch_add(1, std::memory_order_relaxed);
}

I get back the following assembly on x86:

51               push        ecx  
B8 02 00 00 00   mov         eax,2 
8D 0C 24         lea         ecx,[esp] 
87 01            xchg        eax,dword ptr [ecx] 
B8 01 00 00 00   mov         eax,1 
F0 0F C1 01      lock xadd   dword ptr [ecx],eax 
59               pop         ecx  
C3               ret              

and similarly on x64:

B8 02 00 00 00    mov         eax,2 
87 44 24 08       xchg        eax,dword ptr [rsp+8] 
B8 01 00 00 00    mov         eax,1 
F0 0F C1 44 24 08 lock xadd   dword ptr [rsp+8],eax 
C3                ret              

I simply don't understand: why does a relaxed increment of an int variable require a lock prefix?

Is there a reason for this, or did they simply not include the optimization of removing it?


* I used /O2 with /NoDefaultLib to trim it down and get rid of unnecessary C runtime code, but that's irrelevant to the question.


Solution

  • Because a lock is still required for it to be atomic; even with memory_order_relaxed the requirement for increment/decrement is too strict to be lockless.

    Imagine the same thing with no locks.

    v = 0;
    

    And then we spawn 100 threads, each with this command:

    v++;
    

    And then you wait for all threads to finish, what would you expect v to be? Unfortunately, it may not be 100. Say the value v=23 is loaded by one thread, and before 24 is created, another thread also loads 23 and then writes out 24 too. So the threads actually negate each other. This is because the increment itself is not atomic. Sure, load, store, add may be atomic on their own, but incrementing is multiple steps so it is not atomic.

    But with std::atomic, all operations are atomic, regardless of the std::memory_order setting. The only question is what order they will happen in. memory_order_relaxed still guarantees atomicity, it just might be out of order with respect to anything else happening near it, even operating on the same value.