c++windowsx86atomicinterlocked

InterlockedDecrement uses XADD but InterlockedIncrement uses INC?


I am debugging some code using a boost C++ library, which uses Windows InterlockedDecrement and InterlockedIncrement.

In the outputted assembly InterlockedIncrement uses LOCK INC whereas the InterlockedDecrement uses LOCK XADD.

Why do they not both use LOCK XADD?

(This is on Win 7 64, 64-bit compiling and MSVC 11)


Solution

  • The INC instruction has a shorter encoding. You could implement both with LOCK XADD, but the code would take more space in memory. They are probably identical once they get turned into uops.

    Now, why not use LOCK DEC?

    The problem is that both functions (InterlockedDecrement and InterlockedIncrement) are specified to return the new, incremented value

    long InterlockedDecrement(volatile long *addend);
    long InterlockedIncrement(volatile long *addend);
    

    So if you set out to implement these functions, you will have to use something like LOCK XADD. Your default implementation for InterlockedDecrement will have to look something like this:

    mov         eax, -1
    lock xadd   DWORD PTR [rcx], eax
    dec         eax
    ret         0
    

    During optimization passes, the compiler can then recognize that the return value of these functions is not being used, and replace them with LOCK INC or LOCK DEC.

    It is common to see this pattern in your code for reference counting:

    InterlockedIncrement(&refcount);
    ...
    
    if (InterlockedDecrement(&refcount) == 0)
        ...
    

    So, the compiler sees that the return value of InterlockedIncrement is discarded, and so it uses LOCK INC.

    The compiler can also recognize that the return value of InterlockedDecrement is only used in a conditional, and it can substitute LOCK DEC—but this is a more complicated optimization. There are more opportunities for this optimization to not happen—so for various reasons, you may see LOCK INC in disassembly paired with LOCK XADD, depending on whether the optimization happened or not.

    We have a limited amount of insight into the original code and the logic that the compiler uses to select LOCK INC / LOCK DEC versus LOCK XADD, but I think it is enough to understand that LOCK DEC is an optimization, and there are two main reasons why the optimization may not happen: