c++x86intrinsicsintel-tsx

Does an aborted xbegin transaction restore the stack context that existed at the xbegin start?


I am interested in encapsulating a transactional xbegin and xend inside XBEGIN( ) and XEND( ) functions, in a static assembler lib. However I am unclear how (or if) the stack gets restored to the original xbegin calling state, given an xabort originating at some other stack level (higher or lower). In other words, is the dynamic stack context (including interrupts effects) managed and rolled back as just another part of the transaction?

This assembler approach is needed for a VC++ 2010 build that doesn't have _xbegin( ) and _xend( ) intrinsics supported or available, and x64 builds cannot use _asm { } inlining.


Solution

  • related: See also David Kanter's TSX writeup for some theory on how it works under the hood and how software can benefit from it, and this blog post for some experimental performance numbers on HSW (before the TSX bug was discovered and microcode updates disabled TSX on that hardware.)


    The Intel insn ref manual entry for xbegin is pretty clear. (See the tag wiki for links to Intel's official PDF, and other stuff.)

    On an RTM abort, the logical processor discards all architectural register and memory updates performed during the RTM execution and restores architectural state to that corresponding to the outermost XBEGIN instruction. The fallback address following an abort is computed from the outermost XBEGIN instruction.

    So the instruction works like a conditional branch, where the branch condition is "did an abort happen before XEND?" e.g.:

    ; NASM syntax, I assume MASM is similar
    ALIGN 16
    retry:
        ; eax holds abort info, all other architectural state + memory is unchanged
        inc     [retry_count]      ; or whatever other debug instrumentation you want to add
    
    global xbegin_wrapper_with_retry
    xbegin_wrapper_with_retry:
        xbegin  retry
        ret
    

    If an abort happens, it's as if all the code that ran after xbegin didn't run at all, just a jump to the fallback address with eax modified.

    You might want to do something other than just infinite retries on an abort, of course. This isn't meant to be a real example. (This article does have a real example of the kind of logic you might want to use, using intrinsics. It looks like they just test eax instead of using the xbegin as the jump in an if, unless the compiler optimizes that check. IDK if it's the most efficient way.)

    What do you mean "interrupts effects"? In current implementations, anything that changes privilege level (like a syscall or interrupt) causes a transaction abort. So ring-level changes never need to be rolled back. The CPU will just abort the transaction when it encounters anything it can't roll back. This means the possible bugs include putting something inside the transaction that always causes an abort, but not that you do something that can't be rolled back.


    You might want to try to get the compiler to emit the three-byte XEND instruction without a function call, so pushing the return address onto the stack isn't part of the transaction. e.g.

    // no idea if this is safe, or if it might get reordered by the optimizer
    #define xend_MSVC  __asm _emit 0x0F  __asm _emit   0x01 __asm _emit 0xD5
    

    I think this does still work in 64bit mode, since the doc mentions rax, and it looks like IACA's header file uses __asm _emit.

    It'll be safer to put XEND in its own wrapper function, too, I guess. You just need a stop-gap until you can upgrade to a compiler with intrinsics, so it doesn't have to be perfect as long as the extra reads/writes from the ret and call don't cause too many aborts.