This comes from a post about invoking a trivial buffer overflow (to jump to a function present in the source but not called explicitly in any place of the program (2333909/how-can-i-invoke-buffer-overflow)), where an answerer, posted an interesting detour (inline assembly alternative: trying to force a premature epilogue, redirecting to function g()).
Both from 2010.
Question context: reading posts and blogs DontUseInlineAsm, feels like inline assembly started at its inception being, at least for aficionados, a harmless, naiv and witty thing to tinker with, but with the passage of time and sophistication, due to constraints and assumptions from and for compilers (e.g. ABI calling conventions, optimizations, security, etc.), things that are expected to be in a certain way (e.g. register contents), can end up being badly overwritten, inviting for silent Undefined Behavior in worst cases or sound errors or crashes in best cases.
Definitely not a trivial topic: https://stackoverflow.com/tags/inline-assembly/info
My question: is it possible for the code below, to be updated and get it to work by today's standards, while maintaining its spirit? And of course, the human understandable rationale behind those errors: why this code doesn't work nowadays? Why is not possible to brute force a forged epilogue in this manner?
Related considerations: today an alternative like this still makes sense? Or writing inline assembly taking into account the number of things the human programmer needs to consider, are already unfathomable and things are going to get worse?
To my understanding, this part is the registers clobbering (telling the compiler what is going to be touched by programmer's action):
: "%ebp", "%esp"
I don't know what this means in this context (suspect has a lot to do with g()
's address, function purposed to jump to):
: "r" (&g)
gcc -m32 -g -o overflow_inlineassembly overflow_inlineassembly.c
overflow_inlineassembly.c: In function ‘f’:
overflow_inlineassembly.c:18:9: warning: listing the stack pointer register ‘%esp’ in a clobber list is deprecated [-Wdeprecated]
18 | asm(
| ^~~
overflow_inlineassembly.c:18:9: note: the value of the stack pointer after an ‘asm’ statement must be the same as it was before the statement
overflow_inlineassembly.c:27:1: error: bp cannot be used in ‘asm’ here
27 | }
| ^
Code (my typing):
#include <stdio.h>
void g()
{
printf("now inside g()!\n");
}
void f()
{
printf("now inside f()!\n");
// can only modify this section
// cant call g(), maybe use g (pointer to function)
/* x86 function epilogue-like inline assembly */
/* Causes f() to return to g() on its way back to main() */
asm(
"mov %%ebp, %%esp;"
"pop %%ebp;"
"push %0;"
"ret"
: /* no output registers */
: "r" (&g)
: "%ebp", "%esp"
);
}
int main (int argc, char * argv[])
{
f();
return 0;
}
It should still work as well as before (which isn't saying much) if you remove the "%esp", "%ebp"
clobbers and compile without enabling optimization. Works for me on Linux with GCC14.2, with gcc -m32 stack.c
: exits without a segfault after printing both messages.
It depends on -fno-omit-frame-pointer
, which is the default at -O0
even in current GCC/Clang, and the comments assume f()
won't inline into main
, again which the default -O0
definitely won't do.
I don't think there's any way to make this actually safe and portable, that will compile to working asm with or without optimization, and with arbitrary other code earlier in f()
and main()
. You can't know what other registers you should be restoring in your epilogue, even if you do something like alloca
or a VLA (volatile int foo[argc] = {1}
) that forces the compiler to use a frame pointer for that function.
This is always going to be a dirty hack that makes assumptions about the compiler's code-gen, like the original from 2010 was.
The hand-written asm is only compatible with what modern GCC will do in a debug build (i.e. with -fno-omit-frame-pointer
, and makes most sense without the compiler inlining f
into main
. With EBP as a frame pointer, it's an error to clobber EBP, although very old GCC didn't detect that with optimization disabled.)
GCC history:
The default with no options is and always has been -O0 -fno-omit-frame-pointer
, so EBP gets used as a frame pointer (for GCC, even in leaf functions). When I mention using these options, I'm contrasting the behaviour they imply against different choices, not whether you use them explicitly or by default.
With -m64
, optimization has always implied -fomit-frame-pointer
. (Maybe only with -O2
or -O3
in early GCC; currently even -O1
implies -fomit-frame-pointer
for 64-bit code.)
With -m32
, GCC4.6 and later: -O1
or higher implies -fomit-frame-pointer
With -m32
in older GCC: -fno-omit-frame-pointer
is still the default even at -O3
.
The i386 ABI (as used on Linux) changed somewhere around 2011 to enforce 16-byte stack alignment, after problems were discovered in 2009 that GCC had already been making binaries which depended on 16-byte stack alignment (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838#c86 / Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?). Other OSes kept their 4-byte ABI, with 16-byte alignment only something GCC does for performance. Probably not a factor here; -mpreferred-stack-boundary=4
(2^4 = 16) was the default even in ancient GCC, but is the reason for subl $8, %esp
before call printf
. Using EBP as a frame pointer allows an epilogue to not care about that extra offset.
The warning about ESP is due to https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers-1
Another restriction is that the clobber list should not contain the stack pointer register. This is because the compiler requires the value of the stack pointer to be the same after an asm statement as it was on entry to the statement. However, previous versions of GCC did not enforce this rule and allowed the stack pointer to appear in the list, with unclear semantics. This behavior is deprecated and listing the stack pointer may become an error in future versions of GCC.
(ESP is the stack pointer in 32-bit x86; EBP is the frame pointer in functions that use one.)
Execution doesn't come out the bottom of the asm statement, it jumps to g
as a tailcall instead. So the ESP/EBP clobbers are pretty much irrelevant; you can just leave them out if you want to try out silly computer tricks that depend on assumptions like debug builds.
A "memory"
clobber (to force all earlier side-effects to be done before the asm statement) and a __builtin_unreachable()
after the asm statement (to tell the compiler execution doesn't come out the bottom, as recommended in the manual for cases where asm goto
isn't usable) would be a good idea, but can't make this fully safe.
The error about EBP is a full error, except again it's not really relevant for this because execution doesn't come out the bottom of the asm
statement. The error happens even in GCC3.4.2 (from 2006, 3.4.0 was 2004) and GCC4.1.2 (from 2007 / 2006), Godbolt.
(Only with optimization enabled in those old compilers: they fail to detect the error with the default -O0
.)
That's just a matter of not checking for something which was already a problem: if execution did come out the bottom of the asm
statement with a modified EBP in a function using an EBP frame pointer, it'd crash because GCC emits asm that depends on EBP being unchanged. For example, if you used an "ebp"
clobber to correctly describe asm("xor %%ebp, %%ebp" ::: "ebp")
to the compiler, execution would reach GCC's leave; ret
with EBP=0, so leave
(works like mov %ebp, %esp ; pop %ebp
) would set ESP=0 and try to pop EBP from 0xFFFFFFFC
, page-faulting on unmapped memory under typical OSes. For example:
void crashme(){
asm("xor %%ebp, %%ebp" ::: "ebp");
}
compiles to this broken asm, despite the fact that we told GCC our asm clobbers EBP.
# GCC4.1 -m32 -O0
crashme:
pushl %ebp
movl %esp, %ebp
xor %ebp, %ebp
leave # acts like mov %ebp, %esp ; pop %ebp
ret
With -m32 -O1 -fomit-frame-pointer
, we get safe code: push/pop of EBP around xor %ebp, %ebp
. (But not mov %esp, %ebp
; it's just saving/restoring it, not using it as a frame pointer. It could have saved it in a register the asm didn't clobber, like ECX, but that would rarely be a useful optimization for GCC to look for, except in code-bases with lots of badly-written inline asm that clobbered specific registers instead of letting the compiler pick.)
With -m32 -O1 -fno-omit-frame-pointer
, GCC correctly refuses to compile it, with an error. (In this case the asm statement didn't have any "m"(local_var)
operands where the compiler would want to use EBP as part of the addressing mode, and the compiler could have stashed the value in a register the asm statement wasn't using, but it would take extra code to handle that corner case for little value. Inline asm is already a pretty niche feature.)
Out of the compilers on Godbolt, GCC 4.4.7 is the earliest one that rejects this code even with -m32 -O0
. (GCC 4.4.0 is from 2009, 4.4.7 is from 2012 but point releases probably didn't change that.) Godbolt doesn't have a GCC4.2 or 4.3.
Those early GCCs which don't error on this at -O0
also make ABI-violating code with -O0 -fomit-frame-pointer
: xor %ebp, %ebp; ret
, which destroys the caller's EBP, which is supposed to be a call-preserved register.
You can use an "%ebp"
clobber if you compile with -fomit-frame-pointer
(at -O0
or with optimization), but that's not the default in those old GCCs. And of course this hacked up manual epilogue assumes an EBP frame pointer.
So anyway, @jschmier's original didn't even compile with optimization enabled, even with old compilers from before 2010. And depends on -fno-omit-frame-pointer
(being the default or specified manually.) But it did happen to compile as intended with optimization disabled (the default) on those GCC 4.1 or earlier.
If f
inlined into main
, we'd be tailcalling it from main
, skipping any code between the f()
call and main's closing }
(including the return 0;
and even the implicit C99 return 0; this would make main
effectively return g();
with whatever garbage void g()
leaves in EAX.)
With GCC 4.0.4 -O3
, with the %ebp
clobber removed so GCC will compile it, f
inlines into main
. The stand-alone definition for f
isn't reached if you actually run the program.
main:
pushl %ebp
movl %esp, %ebp # set up EBP as a frame pointer
subl $8, %esp # align the stack
andl $-16, %esp # realign the stack because this is main
subl $16, %esp # allocate arg-passing space since -maccumulate-outgoing-args is the default for -mtune=generic in old GCC; old CPUs had slow PUSH
movl $.LC1, (%esp) # the inside f() message
call puts
movl $g, %eax
# inline asm starts here
mov %ebp, %esp;pop %ebp;jmp *%eax;
# inline asm ends here
leave # unreachable, the jmp or push/ret tailcalls g
xorl %eax, %eax # main's return 0, won't be reached.
ret
This tailcalls g
from main
. Copy/pasting the asm from Godbolt to my desktop and adding .global main
to it, I can built it into an executable with gcc -m32 -no-pie stack-inlined.S
. (I didn't need to disable the directive filter on Godbolt because the only data is read-only, and it works to have it in the .text
section mixed with code, and for machine code not to be aligned. Only a performance downside.)
Running the resulting ./a.out
, I still get both messages printed with no segfault, but the process exit status (echo $?
) is 16
: the return value of the last puts
, which in GNU/Linux happens to be the length of the message in this case.
main
tailcalls g
, so g
returns to __libc_start_call_main
(defined in glibc source code), which after main returns just calls exit
with its return value (i.e. push %eax
; call exit
).
So this code still sort of works with optimization enabled (as long as we force GCC to use frame pointers), but there are places where code gets skipped.
This manual epilogue only restores EBP. The other call-preserved registers are EBX, ESI, and EDI. (And ESP; i386 SysV is a caller-pops convention.) EAX, ECX, and EDX are call-clobbered, but if a compiler needs more regs at the same time, or ones that will survive across other function calls, it will make a prologue/epilogue that save/restore other call-preserved register.
This hacky epilogue will still return to main (assuming f
isn't inlined), but without restoring any other call-preserved registers, violating the ABI. If you'd just written g();
at the bottom of f
, GCC's epilogue would run before a jmp
tailcall, so all call-preserved registers would be restored when control returned to main (directly from the ret
in g
.)
Also of course GCC wouldn't use a function pointer, it would jmp g
which will assemble to a direct jmp rel32
, or maybe even jmp rel8
since the functions are small enough to be next to each other.
So again, this hack is not general; only works correctly in this trivial example.
#include <stdio.h>
#if (__GNUC__ > 4) || \
((__GNUC__ == 4) && (__GNUC_MINOR__ > 4))
#define UNREACHABLE() __builtin_unreachable() // apparently new in GCC4.5
#else
#define UNREACHABLE() do{}while(0)
#endif
// all this mess is instead of just using __builtin_unreachable(), so this code also works on GCC before 4.5
void g() {
printf("now inside g()!\n");
}
void f()
{
printf("now inside f()!\n");
/* x86 function epilogue-like inline assembly */
/* Causes f() to return to g() on its way back to main() */
asm(
"mov %%ebp, %%esp;"
"pop %%ebp;" // could have just used leave
"jmp *%0;" // indirect jump to function pointer, like push/ret but without disturbing branch prediction for call/ret
: /* no output registers */
: "r" (&g) // ask for a function pointer in a register, compiler's choice
: "memory" //"%ebp", "%esp"
);
UNREACHABLE();
}
int main (void)
{
f();
return 0;
}
You could also ask for g
as an immediate input so you could use it with a direct jump, like an "i"(g)
(https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html), for "jmp %P0"
to print it as jmp g
(see docs for operand modifiers). This works in a non-PIE executable; with -fPIE
gcc complains. You could of course just write jmp g
if you know you're compiling for a target like Linux which doesn't decorate symbol names, or jmp _g
if compiling for Windows or macOS.
Godbolt. I get the same result as with just commenting out the EBP clobber in the original, but it's maybe slightly more robust, and slightly less clunky (jmp instead of push/ret).
$ gcc -m32 stack.c
$ ./a.out
now inside f()!
now inside g()!
$ echo $?
0
The __builtin_unreachable()
after the asm statement gets GCC to not emit its own epilogue for that function:
# GCC15 -m32 -O0
f:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
subl $12, %esp
pushl $.LC1
call puts
addl $16, %esp
movl $g, %eax
mov %ebp, %esp;pop %ebp;jmp *%eax;
due to constraints and assumptions from and for compilers (e.g. ABI calling conventions, optimizations, security, etc.), things that are expected to be in a certain way (e.g. register contents), can end up being badly overwritten,
Unless you're making function calls from inside an asm
statement, the ABI's calling convention doesn't matter. You tell the compiler what the inputs/outputs are, and whether you need them in register, memory, or its choice. It chooses registers and/or invents addressing-modes.
Doing crazy stuff like jumping out of an asm
statement is already tricky, but you can put __builtin_unreachable()
after asm()
to tell the compiler about it.
People probably have done crazy stuff that happened to work in versions of GCC they were using. The documentation for inline asm
has evolved a lot to be more clear about what is or isn't supported. (@David Wohlferd could probably say more about that, having written worked on the docs and written DontUseInlineAsm).