I am debugging some code using a boost C++ library, which uses Windows InterlockedDecrement
and InterlockedIncrement
.
In the outputted assembly InterlockedIncrement uses LOCK INC
whereas the InterlockedDecrement uses LOCK XADD
.
Why do they not both use LOCK XADD
?
(This is on Win 7 64, 64-bit compiling and MSVC 11)
The INC
instruction has a shorter encoding. You could implement both with LOCK XADD
, but the code would take more space in memory. They are probably identical once they get turned into uops.
Now, why not use LOCK DEC
?
The problem is that both functions (InterlockedDecrement
and InterlockedIncrement
) are specified to return the new, incremented value
long InterlockedDecrement(volatile long *addend);
long InterlockedIncrement(volatile long *addend);
So if you set out to implement these functions, you will have to use something like LOCK XADD
. Your default implementation for InterlockedDecrement
will have to look something like this:
mov eax, -1
lock xadd DWORD PTR [rcx], eax
dec eax
ret 0
During optimization passes, the compiler can then recognize that the return value of these functions is not being used, and replace them with LOCK INC
or LOCK DEC
.
It is common to see this pattern in your code for reference counting:
InterlockedIncrement(&refcount);
...
if (InterlockedDecrement(&refcount) == 0)
...
So, the compiler sees that the return value of InterlockedIncrement
is discarded, and so it uses LOCK INC
.
The compiler can also recognize that the return value of InterlockedDecrement
is only used in a conditional, and it can substitute LOCK DEC
—but this is a more complicated optimization. There are more opportunities for this optimization to not happen—so for various reasons, you may see LOCK INC
in disassembly paired with LOCK XADD
, depending on whether the optimization happened or not.
We have a limited amount of insight into the original code and the logic that the compiler uses to select LOCK INC
/ LOCK DEC
versus LOCK XADD
, but I think it is enough to understand that LOCK DEC
is an optimization, and there are two main reasons why the optimization may not happen: