c++gccx86inline-assemblyred-zone

Using base pointer register in C++ inline asm


I want to be able to use the base pointer register (%rbp) within inline asm. A toy example of this is like so:

void Foo(int &x)
{
    asm volatile ("pushq %%rbp;"         // 'prologue'
                  "movq %%rsp, %%rbp;"   // 'prologue'
                  "subq $12, %%rsp;"     // make room

                  "movl $5, -12(%%rbp);" // some asm instruction

                  "movq %%rbp, %%rsp;"  // 'epilogue'
                  "popq %%rbp;"         // 'epilogue'
                  : : : );
    x = 5;
}

int main() 
{
    int x;
    Foo(x);
    return 0;
}

I hoped that, since I am using the usual prologue/epilogue function-calling method of pushing and popping the old %rbp, this would be ok. However, it seg faults when I try to access x after the inline asm.

The GCC-generated assembly code (slightly stripped-down) is:

_Foo:
    pushq   %rbp
    movq    %rsp, %rbp
    movq    %rdi, -8(%rbp)

    # INLINEASM
    pushq %rbp;          // prologue
    movq %rsp, %rbp;     // prologue
    subq $12, %rsp;      // make room
    movl $5, -12(%rbp);  // some asm instruction
    movq %rbp, %rsp;     // epilogue
    popq %rbp;           // epilogue
    # /INLINEASM

    movq    -8(%rbp), %rax
    movl    $5, (%rax)      // x=5;
    popq    %rbp
    ret

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    leaq    -4(%rbp), %rax
    movq    %rax, %rdi
    call    _Foo
    movl    $0, %eax
    leave
    ret

Can anyone tell me why this seg faults? It seems that I somehow corrupt %rbp but I don't see how. Thanks in advance.

I'm running GCC 4.8.4 on 64-bit Ubuntu 14.04.


Solution

  • See the bottom of this answer for a collection of links to other inline-asm Q&As.

    Your code is broken because you step on the red-zone below RSP (with push) where GCC was keeping a value.


    What are you hoping to learn to accomplish with inline asm? If you want to learn inline asm, learn to use it to make efficient code, rather than horrible stuff like this. If you want to write function prologues and push/pop to save/restore registers, you should write whole functions in asm. (Then you can easily use nasm or yasm, rather than the less-preferred-by-most AT&T syntax with GNU assembler directives1.)

    GNU inline asm is hard to use, but allows you to mix custom asm fragments into C and C++ while letting the compiler handle register allocation and any saving/restoring if necessary. Sometimes the compiler will be able to avoid the save and restore by giving you a register that's allowed to be clobbered. Without volatile, it can even hoist asm statements out of loops when the input would be the same. (i.e. unless you use volatile, the outputs are assumed to be a "pure" function of the inputs.)

    If you're just trying to learn asm in the first place, GNU inline asm is a terrible choice. You have to fully understand almost everything that's going on with the asm, and understand what the compiler needs to know, to write correct input/output constraints and get everything right. Mistakes will lead to clobbering things and hard-to-debug breakage. The function-call ABI is a much simpler and easier to keep track of boundary between your code and the compiler's code.


    Why this breaks

    You compiled with -O0, so gcc's code spills the function parameter from %rdi to a location on the stack. (This could happen in a non-trivial function even with -O3).

    Since the target ABI is the x86-64 SysV ABI, it uses the "Red Zone" (128 bytes below %rsp that even asynchronous signal handlers aren't allowed to clobber), instead of wasting an instruction decrementing the stack pointer to reserve space.

    It stores the 8B pointer function arg at -8(rsp_at_function_entry). Then your inline asm pushes %rbp, which decrements %rsp by 8 and then writes there, clobbering the low 32b of &x (the pointer).

    When your inline asm is done,

    I didn't check with a debugger; one of those steps might be slightly off, but the problem is definitely that you clobber the red zone, leading to gcc's code trashing the stack.

    Even adding a "memory" clobber doesn't get gcc to avoid using the red zone, so it looks like allocating your own stack memory from inline asm is just a bad idea. (A memory clobber means you might have written some memory you're allowed to write to, e.g. a global variable or something pointed-to by a global, not that you might have overwritten something you're not supposed to.)

    If you want to use scratch space from inline asm, you should probably declare an array as a local variable and use it as an output-only operand (which you never read from).

    AFAIK, there's no syntax for declaring that you modify the red-zone, so your only options are:


    Here's what you should have done:

    void Bar(int &x)
    {
        int tmp;
        long tmplong;
        asm ("lea  -16 + %[mem1], %%rbp\n\t"
             "imul $10, %%rbp, %q[reg1]\n\t"  // q modifier: 64bit name.
             "add  %k[reg1], %k[reg1]\n\t"    // k modifier: 32bit name
             "movl $5, %[mem1]\n\t" // some asm instruction writing to mem
               : [mem1] "=m" (tmp), [reg1] "=r" (tmplong)  // tmp vars -> tmp regs / mem for use inside asm
               :
               : "%rbp" // tell compiler it needs to save/restore %rbp.
      // gcc refuses to let you clobber %rbp with -fno-omit-frame-pointer (the default at -O0)
      // clang lets you, but memory operands still use an offset from %rbp, which will crash!
      // gcc memory operands still reference %rsp, so don't modify it.  Declaring a clobber on %rsp does nothing
             );
        x = 5;
    }
    

    Note the push/pop of %rbp in the code outside the #APP / #NO_APP section, emitted by gcc. Also note that the scratch memory it gives you is in the red zone. If you compile with -O0, you'll see that it's at a different position from where it spills &x.

    To get more scratch regs, it's better to just declare more output operands that are never used by the surrounding non-asm code. That leaves register allocation to the compiler, so it can be different when inlined into different places. Choosing ahead of time and declaring a clobber only makes sense if you need to use a specific register (e.g. shift count in %cl). Of course, an input constraint like "c" (count) gets gcc to put the count in rcx/ecx/cx/cl, so you don't emit a potentially redundant mov %[count], %%ecx.

    If this looks too complicated, don't use inline asm. Either lead the compiler to the asm you want with C that's like the optimal asm, or write a whole function in asm.

    When using inline asm, keep it as small as possible: ideally just the one or two instructions that gcc isn't emitting on its own, with input/output constraints to tell it how to get data into / out of the asm statement. This is what it's designed for.

    Rule of thumb: if your GNU C inline asm start or ends with a mov, you're usually doing it wrong and should have used a constraint instead.


    Footnotes:

    1. You can use GAS's intel-syntax in inline-asm by building with -masm=intel (in which case your code will only work with that option), or using dialect alternatives so it works with the compiler in Intel or AT&T asm output syntax. But that doesn't change the directives, and GAS's Intel-syntax is not well documented. (It's like MASM, not NASM, though.) I don't really recommend it unless you really hate AT&T syntax.

    Inline asm links:

    Some of those re-iterate some of the same stuff I explained here. I didn't re-read them to try to avoid redundancy, sorry.