c inline-assembly buffer-overflow abi epilogue

Forcing a C program to take a forged epilogue made with inline-assembly to jump to arbitrary function

This comes from a post about invoking a trivial buffer overflow (to jump to a function present in the source but not called explicitly in any place of the program (2333909/how-can-i-invoke-buffer-overflow)), where an answerer, posted an interesting detour (inline assembly alternative: trying to force a premature epilogue, redirecting to function g()).
Both from 2010.

Question context: reading posts and blogs DontUseInlineAsm, feels like inline assembly started at its inception being, at least for aficionados, a harmless, naiv and witty thing to tinker with, but with the passage of time and sophistication, due to constraints and assumptions from and for compilers (e.g. ABI calling conventions, optimizations, security, etc.), things that are expected to be in a certain way (e.g. register contents), can end up being badly overwritten, inviting for silent Undefined Behavior in worst cases or sound errors or crashes in best cases.
Definitely not a trivial topic: https://stackoverflow.com/tags/inline-assembly/info

My question: is it possible for the code below, to be updated and get it to work by today's standards, while maintaining its spirit? And of course, the human understandable rationale behind those errors: why this code doesn't work nowadays? Why is not possible to brute force a forged epilogue in this manner?
Related considerations: today an alternative like this still makes sense? Or writing inline assembly taking into account the number of things the human programmer needs to consider, are already unfathomable and things are going to get worse?

To my understanding, this part is the registers clobbering (telling the compiler what is going to be touched by programmer's action):

: "%ebp", "%esp"

I don't know what this means in this context (suspect has a lot to do with g()'s address, function purposed to jump to):

: "r" (&g)

Typed the code (see below), when compiled:

gcc -m32 -g -o overflow_inlineassembly overflow_inlineassembly.c

overflow_inlineassembly.c: In function ‘f’:
overflow_inlineassembly.c:18:9: warning: listing the stack pointer register ‘%esp’ in a clobber list is deprecated [-Wdeprecated]
   18 |         asm(
      |         ^~~
overflow_inlineassembly.c:18:9: note: the value of the stack pointer after an ‘asm’ statement must be the same as it was before the statement
overflow_inlineassembly.c:27:1: error: bp cannot be used in ‘asm’ here
   27 | }
      | ^

Code (my typing):

#include <stdio.h>


void g()
{
        printf("now inside g()!\n");
}


void f()
{
        printf("now inside f()!\n");
        // can only modify this section
        // cant call g(), maybe use g (pointer to function)

        /* x86 function epilogue-like inline assembly */
        /* Causes f() to return to g() on its way back to main() */
        asm(
                "mov %%ebp, %%esp;"
                "pop %%ebp;"
                "push %0;"
                "ret"
                : /* no output registers */
                : "r" (&g)
                : "%ebp", "%esp"
               );
}


int main (int argc, char * argv[])
{
        f();
        return 0;
}

Solution

It should still work as well as before (which isn't saying much) if you remove the "%esp", "%ebp" clobbers and compile without enabling optimization. Works for me on Linux with GCC14.2, with gcc -m32 stack.c: exits without a segfault after printing both messages.

It depends on -fno-omit-frame-pointer, which is the default at -O0 even in current GCC/Clang, and the comments assume f() won't inline into main, again which the default -O0 definitely won't do.

I don't think there's any way to make this actually safe and portable, that will compile to working asm with or without optimization, and with arbitrary other code earlier in f() and main(). You can't know what other registers you should be restoring in your epilogue, even if you do something like alloca or a VLA (volatile int foo[argc] = {1}) that forces the compiler to use a frame pointer for that function.

This is always going to be a dirty hack that makes assumptions about the compiler's code-gen, like the original from 2010 was.

The hand-written asm is only compatible with what modern GCC will do in a debug build (i.e. with -fno-omit-frame-pointer, and makes most sense without the compiler inlining f into main. With EBP as a frame pointer, it's an error to clobber EBP, although very old GCC didn't detect that with optimization disabled.)

GCC history:

The default with no options is and always has been -O0 -fno-omit-frame-pointer, so EBP gets used as a frame pointer (for GCC, even in leaf functions). When I mention using these options, I'm contrasting the behaviour they imply against different choices, not whether you use them explicitly or by default.
With -m64, optimization has always implied -fomit-frame-pointer. (Maybe only with -O2 or -O3 in early GCC; currently even -O1 implies -fomit-frame-pointer for 64-bit code.)
With -m32, GCC4.6 and later: -O1 or higher implies -fomit-frame-pointer
With -m32 in older GCC: -fno-omit-frame-pointer is still the default even at -O3.
The i386 ABI (as used on Linux) changed somewhere around 2011 to enforce 16-byte stack alignment, after problems were discovered in 2009 that GCC had already been making binaries which depended on 16-byte stack alignment (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838#c86 / Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?). Other OSes kept their 4-byte ABI, with 16-byte alignment only something GCC does for performance. Probably not a factor here; -mpreferred-stack-boundary=4 (2^4 = 16) was the default even in ancient GCC, but is the reason for subl $8, %esp before call printf. Using EBP as a frame pointer allows an epilogue to not care about that extra offset.

The warning about ESP is due to https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers-1

Another restriction is that the clobber list should not contain the stack pointer register. This is because the compiler requires the value of the stack pointer to be the same after an asm statement as it was on entry to the statement. However, previous versions of GCC did not enforce this rule and allowed the stack pointer to appear in the list, with unclear semantics. This behavior is deprecated and listing the stack pointer may become an error in future versions of GCC.

(ESP is the stack pointer in 32-bit x86; EBP is the frame pointer in functions that use one.)

Execution doesn't come out the bottom of the asm statement, it jumps to g as a tailcall instead. So the ESP/EBP clobbers are pretty much irrelevant; you can just leave them out if you want to try out silly computer tricks that depend on assumptions like debug builds.

A "memory" clobber (to force all earlier side-effects to be done before the asm statement) and a __builtin_unreachable() after the asm statement (to tell the compiler execution doesn't come out the bottom, as recommended in the manual for cases where asm goto isn't usable) would be a good idea, but can't make this fully safe.

The error about EBP is a full error, except again it's not really relevant for this because execution doesn't come out the bottom of the asm statement. The error happens even in GCC3.4.2 (from 2006, 3.4.0 was 2004) and GCC4.1.2 (from 2007 / 2006), Godbolt. (Only with optimization enabled in those old compilers: they fail to detect the error with the default -O0.)

That's just a matter of not checking for something which was already a problem: if execution did come out the bottom of the asm statement with a modified EBP in a function using an EBP frame pointer, it'd crash because GCC emits asm that depends on EBP being unchanged. For example, if you used an "ebp" clobber to correctly describe asm("xor %%ebp, %%ebp" ::: "ebp") to the compiler, execution would reach GCC's leave; ret with EBP=0, so leave (works like mov %ebp, %esp ; pop %ebp) would set ESP=0 and try to pop EBP from 0xFFFFFFFC, page-faulting on unmapped memory under typical OSes. For example:

void crashme(){
    asm("xor %%ebp, %%ebp" ::: "ebp");
}

compiles to this broken asm, despite the fact that we told GCC our asm clobbers EBP.

# GCC4.1 -m32 -O0 
crashme:
        pushl   %ebp
        movl    %esp, %ebp
        xor %ebp, %ebp
        leave        # acts like   mov %ebp, %esp ;  pop %ebp
        ret

With -m32 -O1 -fomit-frame-pointer, we get safe code: push/pop of EBP around xor %ebp, %ebp. (But not mov %esp, %ebp; it's just saving/restoring it, not using it as a frame pointer. It could have saved it in a register the asm didn't clobber, like ECX, but that would rarely be a useful optimization for GCC to look for, except in code-bases with lots of badly-written inline asm that clobbered specific registers instead of letting the compiler pick.)

With -m32 -O1 -fno-omit-frame-pointer, GCC correctly refuses to compile it, with an error. (In this case the asm statement didn't have any "m"(local_var) operands where the compiler would want to use EBP as part of the addressing mode, and the compiler could have stashed the value in a register the asm statement wasn't using, but it would take extra code to handle that corner case for little value. Inline asm is already a pretty niche feature.)

Out of the compilers on Godbolt, GCC 4.4.7 is the earliest one that rejects this code even with -m32 -O0. (GCC 4.4.0 is from 2009, 4.4.7 is from 2012 but point releases probably didn't change that.) Godbolt doesn't have a GCC4.2 or 4.3.

Those early GCCs which don't error on this at -O0 also make ABI-violating code with -O0 -fomit-frame-pointer: xor %ebp, %ebp; ret, which destroys the caller's EBP, which is supposed to be a call-preserved register.

You can use an "%ebp" clobber if you compile with -fomit-frame-pointer (at -O0 or with optimization), but that's not the default in those old GCCs. And of course this hacked up manual epilogue assumes an EBP frame pointer.

So anyway, @jschmier's original didn't even compile with optimization enabled, even with old compilers from before 2010. And depends on -fno-omit-frame-pointer (being the default or specified manually.) But it did happen to compile as intended with optimization disabled (the default) on those GCC 4.1 or earlier.

Other breakage mechanisms: inlining

If f inlined into main, we'd be tailcalling it from main, skipping any code between the f() call and main's closing } (including the return 0; and even the implicit C99 return 0; this would make main effectively return g(); with whatever garbage void g() leaves in EAX.)

With GCC 4.0.4 -O3, with the %ebp clobber removed so GCC will compile it, f inlines into main. The stand-alone definition for f isn't reached if you actually run the program.

main:
        pushl   %ebp
        movl    %esp, %ebp      # set up EBP as a frame pointer
        subl    $8, %esp        # align the stack
        andl    $-16, %esp      # realign the stack because this is main
        subl    $16, %esp       # allocate arg-passing space since -maccumulate-outgoing-args is the default for -mtune=generic in old GCC; old CPUs had slow PUSH
        movl    $.LC1, (%esp)   # the inside f() message
        call    puts
        movl    $g, %eax
# inline asm starts here
        mov %ebp, %esp;pop %ebp;jmp *%eax;
# inline asm ends here
        leave                # unreachable, the jmp or push/ret tailcalls g
        xorl    %eax, %eax   # main's return 0, won't be reached.
        ret

This tailcalls g from main. Copy/pasting the asm from Godbolt to my desktop and adding .global main to it, I can built it into an executable with gcc -m32 -no-pie stack-inlined.S. (I didn't need to disable the directive filter on Godbolt because the only data is read-only, and it works to have it in the .text section mixed with code, and for machine code not to be aligned. Only a performance downside.)

Running the resulting ./a.out, I still get both messages printed with no segfault, but the process exit status (echo $?) is 16: the return value of the last puts, which in GNU/Linux happens to be the length of the message in this case.

main tailcalls g, so g returns to __libc_start_call_main (defined in glibc source code), which after main returns just calls exit with its return value (i.e. push %eax ; call exit).

So this code still sort of works with optimization enabled (as long as we force GCC to use frame pointers), but there are places where code gets skipped.

If there was code using more registers, they wouldn't get restored

This manual epilogue only restores EBP. The other call-preserved registers are EBX, ESI, and EDI. (And ESP; i386 SysV is a caller-pops convention.) EAX, ECX, and EDX are call-clobbered, but if a compiler needs more regs at the same time, or ones that will survive across other function calls, it will make a prologue/epilogue that save/restore other call-preserved register.

This hacky epilogue will still return to main (assuming f isn't inlined), but without restoring any other call-preserved registers, violating the ABI. If you'd just written g(); at the bottom of f, GCC's epilogue would run before a jmp tailcall, so all call-preserved registers would be restored when control returned to main (directly from the ret in g.)

Also of course GCC wouldn't use a function pointer, it would jmp g which will assemble to a direct jmp rel32, or maybe even jmp rel8 since the functions are small enough to be next to each other.

So again, this hack is not general; only works correctly in this trivial example.

Improved version that "works" with modern GCC

#include <stdio.h>

#if (__GNUC__ > 4) || \
         ((__GNUC__ == 4) && (__GNUC_MINOR__ > 4))
 #define UNREACHABLE() __builtin_unreachable()  // apparently new in GCC4.5
#else
 #define UNREACHABLE() do{}while(0)
#endif
// all this mess is instead of just using __builtin_unreachable(), so this code also works on GCC before 4.5

void g() {
        printf("now inside g()!\n");
}

void f()
{
        printf("now inside f()!\n");

        /* x86 function epilogue-like inline assembly */
        /* Causes f() to return to g() on its way back to main() */
        asm(
                "mov %%ebp, %%esp;"
                "pop %%ebp;"    // could have just used leave
                "jmp *%0;"      // indirect jump to function pointer, like push/ret but without disturbing branch prediction for call/ret
                : /* no output registers */
                : "r" (&g)      // ask for a function pointer in a register, compiler's choice
                : "memory"   //"%ebp", "%esp"
               );
               UNREACHABLE();
}

int main (void)
{
        f();
        return 0;
}

You could also ask for g as an immediate input so you could use it with a direct jump, like an "i"(g) (https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html), for "jmp %P0" to print it as jmp g (see docs for operand modifiers). This works in a non-PIE executable; with -fPIE gcc complains. You could of course just write jmp g if you know you're compiling for a target like Linux which doesn't decorate symbol names, or jmp _g if compiling for Windows or macOS.

Godbolt. I get the same result as with just commenting out the EBP clobber in the original, but it's maybe slightly more robust, and slightly less clunky (jmp instead of push/ret).

$ gcc -m32 stack.c
$ ./a.out 
now inside f()!
now inside g()!
$ echo $?
0

The __builtin_unreachable() after the asm statement gets GCC to not emit its own epilogue for that function:

# GCC15 -m32 -O0
f:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        subl    $12, %esp
        pushl   $.LC1
        call    puts
        addl    $16, %esp
        movl    $g, %eax
        mov %ebp, %esp;pop %ebp;jmp *%eax;

due to constraints and assumptions from and for compilers (e.g. ABI calling conventions, optimizations, security, etc.), things that are expected to be in a certain way (e.g. register contents), can end up being badly overwritten,

Unless you're making function calls from inside an asm statement, the ABI's calling convention doesn't matter. You tell the compiler what the inputs/outputs are, and whether you need them in register, memory, or its choice. It chooses registers and/or invents addressing-modes.

Doing crazy stuff like jumping out of an asm statement is already tricky, but you can put __builtin_unreachable() after asm() to tell the compiler about it.

People probably have done crazy stuff that happened to work in versions of GCC they were using. The documentation for inline asm has evolved a lot to be more clear about what is or isn't supported. (@David Wohlferd could probably say more about that, having written worked on the docs and written DontUseInlineAsm).