cgccx86-64inline-assemblyred-zone

How do I tell gcc that my inline assembly clobbers part of the stack?


Consider inline assembly like this:

uint64_t flags;
asm ("pushf\n\tpop %0" : "=rm"(flags) : : /* ??? */);

Nonwithstanding the fact that there is probably some kind of intrinsic to get the contents of RFLAGS, how do I indicate to the compiler that my inline assembly clobbers one quadword of memory at the top of stack?


Solution

  • Apart from Peter Cordes's approach of skipping the redzone:

    long getflags0(void){
        long f; __asm(
            "add $-128, %%rsp;\n"
            "pushf; pop %0;\n"
            "sub $-128, %%rsp\n" : "=r"(f) :: );
        return f;
    }
    

    which renders:

    0000000000000000 <getflags0>:
       0:   48 83 c4 80             add    $0xffffffffffffff80,%rsp
       4:   9c                      pushfq 
       5:   58                      pop    %rax
       6:   48 83 ec 80             sub    $0xffffffffffffff80,%rsp
       a:   c3                      retq   
    $sz(getflags0)=11
    

    you can also just list rsp as a clobber and silence the deprecation warning:

    long getflags(void){
        long f;
        #pragma GCC diagnostic push
        #pragma GCC diagnostic ignored "-Wdeprecated"
        __asm("pushf; pop %0" : "=r"(f) :: "rsp");
        #pragma GCC diagnostic pop
        return f;
    }
    

    which renders:

    000000000000000b <getflags>:
       b:   55                      push   %rbp
       c:   48 89 e5                mov    %rsp,%rbp
       f:   9c                      pushfq 
      10:   58                      pop    %rax
      11:   c9                      leaveq 
      12:   c3                      retq   
    $sz(getflags)=8
    

    From experience (played with this quite a bit), gcc actually handles rsp clobbers quite well -- by forcing a frame pointer (which it won't let you clobber alongside rsp -- that's a hard assembler error), avoiding the redzone, addressing locals relatively to the frame pointer, and by forcing an %rsp restoring code at the end of the function.

    The mechanism of making the compiler let go of the end of the stack is needed for VLAs and allocas anyway, so I don't think it's going anywhere.

    I think such rsp clobbers are quite usable for custom stack allocations, frees, and stack switches, as long as you don't mess with what the compiler spilled below the stack pointer it gave you (or open it up to being messed with).

    I only had some issues with this approach on clang, but the fix to the compiler seems trivial: https://github.com/llvm/llvm-project/issues/61898.

    As for suppressing the warnings without affecting the whole compilation unit,

    #pragma GCC diagnostic push
    #pragma GCC diagnostic ignored "-Wdeprecated"
    //...
    #pragma GCC diagnostic pop
    

    can work well inside a (possibly inline -- had no issues with rsp clobbers inside inline functions either) function, or you can generate the pragma with _Pragma to make it usable inside of macros.

    Clang doesn't complain about rsp clobbers (though you will run into issues on it if you use rsp clobbers on it for memory allocation unless you apply my fix to a custom build) unless you compile with -fstack-clash-protection. Then the warning is -Wstack-protector, and it's silenceable equivalently.

    Keep in mind that while this happens to work, it is not officially supported. From https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers-1:

    The compiler requires the value of the stack pointer to be the same after an asm statement as it was on entry to the statement. However, previous versions of GCC did not enforce this rule and allowed the stack pointer to appear in the list, with unclear semantics. This behavior is deprecated and listing the stack pointer may become an error in future versions of GCC.


    More analysis:

    An sp clobber on gcc (& my hacked clang) basically means:

    (1) Align sp and assume it stays aligned,ready for calls, with any subsequent sp changes
    (2) Stop using the redzone assuming it might get clobbered;
    (3) Set up and use bp for spills
    (4) Do a `leave` or equivalent at fn exit

    Transient misalignment is OK (e.g., asm("push ...) just as long as it doesn't stick till when a call can be made. These semantics are good, IMO, and simplify a lot of things (inline asm alloca/freeas, inline asm calls (with a custom calling convention maybe), stuff like pushf; pop %0).

    All of this is basically equivalent to treating funcs with "opaque SP adjustments" as having variable sized objects, hence the easy fix for clang: https://github.com/llvm/llvm-project/issues/61898

    diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h  
    index 7d11d63d4..5cc2045e4 100644  
    --- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h  
    +++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h  
    @@ -352,7 +352,7 @@ public:  
       /// This method may be called any time after instruction  
       /// selection is complete to determine if the stack frame for this function  
       /// contains any variable sized objects.  
    -  bool hasVarSizedObjects() const { return HasVarSizedObjects; }  
    +  bool hasVarSizedObjects() const { return HasVarSizedObjects || HasOpaqueSPAdjustment; }
    

    Without this easy fix, clang does setup a frame and does use it, but it fails at reverting rsp from the frame at function exit at it fails at disabling the redzone: https://godbolt.org/z/TTen9js4f

    (In theory, things the semantics of rsp clobbers could get more finegrained:

    E.g., pushf; pop %0 only needs (2). A call from inl asm only needs (1) (better if the compiler aligns the stack. with a push or sub/add instead of requiring you to and $-16,%rsp) and (2) but not (3) and (4). Rsp-changing asm needs (1), (2), (3) but maybe not (4), but leaving (4) to a compiler that's already set up a bp can be better.)