On Visual C++ 2013, when I compile the following code
#include <atomic>
int main()
{
std::atomic<int> v(2);
return v.fetch_add(1, std::memory_order_relaxed);
}
I get back the following assembly on x86:
51 push ecx
B8 02 00 00 00 mov eax,2
8D 0C 24 lea ecx,[esp]
87 01 xchg eax,dword ptr [ecx]
B8 01 00 00 00 mov eax,1
F0 0F C1 01 lock xadd dword ptr [ecx],eax
59 pop ecx
C3 ret
and similarly on x64:
B8 02 00 00 00 mov eax,2
87 44 24 08 xchg eax,dword ptr [rsp+8]
B8 01 00 00 00 mov eax,1
F0 0F C1 44 24 08 lock xadd dword ptr [rsp+8],eax
C3 ret
I simply don't understand: why does a relaxed increment of an int
variable require a lock
prefix?
Is there a reason for this, or did they simply not include the optimization of removing it?
* I used /O2
with /NoDefaultLib
to trim it down and get rid of unnecessary C runtime code, but that's irrelevant to the question.
Because a lock is still required for it to be atomic; even with memory_order_relaxed
the requirement for increment/decrement is too strict to be lockless.
Imagine the same thing with no locks.
v = 0;
And then we spawn 100 threads, each with this command:
v++;
And then you wait for all threads to finish, what would you expect v to be? Unfortunately, it may not be 100. Say the value v=23 is loaded by one thread, and before 24 is created, another thread also loads 23 and then writes out 24 too. So the threads actually negate each other. This is because the increment itself is not atomic. Sure, load, store, add may be atomic on their own, but incrementing is multiple steps so it is not atomic.
But with std::atomic, all operations are atomic, regardless of the std::memory_order
setting. The only question is what order they will happen in. memory_order_relaxed
still guarantees atomicity, it just might be out of order with respect to anything else happening near it, even operating on the same value.