c++assemblyvisual-c++x86buffer-overflow

Why are functions b and f called *twice* in this code after b overwrites its return address with &f (32-bit MSVC debug build)?


I have a very strange code, which as far as I understand, replaces the return address of the function b, and thus the function f is called from it. But I do not quite understand why after the function f has run, execution returns to the function main and from there b is called again. P.s.: The code only works on a 32-bit system

#include <iostream>
int f() {
    std::cout << "Hello";
    return 2;
}

int b() {
    int *m[1];
    m[3] = (int *)&f;
    return 1;
}

int main() {
    return b();
}

I tried to go through the assembler, but it didn't give any special results.

Assembly main:

int main() {
004724E0  push        ebp  
004724E1  mov         ebp,esp  
004724E3  sub         esp,0C0h  
004724E9  push        ebx  
004724EA  push        esi  
004724EB  push        edi  
004724EC  mov         edi,ebp  
004724EE  xor         ecx,ecx  
004724F0  mov         eax,0CCCCCCCCh  
004724F5  rep stos    dword ptr es:[edi]  
004724F7  mov         ecx,offset _666773A0_main@cpp (047E068h)  
004724FC  call        @__CheckForDebuggerJustMyCode@4 (0471389h)  
00472501  nop  
    return b();
00472502  call        b (0471456h)  
}
00472507  pop         edi  
00472508  pop         esi  
00472509  pop         ebx  
0047250A  add         esp,0C0h  
00472510  cmp         ebp,esp  
00472512  call        __RTC_CheckEsp (0471294h)  
00472517  mov         esp,ebp  
00472519  pop         ebp  
0047251A  ret

Assembly b:

int b() {
004722A0  push        ebp  
004722A1  mov         ebp,esp  
004722A3  sub         esp,0CCh  
004722A9  push        ebx  
004722AA  push        esi  
004722AB  push        edi  
004722AC  lea         edi,[ebp-0Ch]  
004722AF  mov         ecx,3  
004722B4  mov         eax,0CCCCCCCCh  
004722B9  rep stos    dword ptr es:[edi]  
004722BB  mov         ecx,offset _666773A0_main@cpp (047E068h)  
004722C0  call        @__CheckForDebuggerJustMyCode@4 (0471389h)  
004722C5  nop  
    int *m[1];
    m[3] = (int *)&f;
004722C6  mov         eax,4  
004722CB  imul        ecx,eax,3  
004722CE  mov         dword ptr m[ecx],offset f (0471172h)  
    return 1;
004722D6  mov         eax,1  
}
004722DB  push        edx  
004722DC  mov         ecx,ebp  
004722DE  push        eax  
004722DF  lea         edx,ds:[472300h]  
004722E5  call        @_RTC_CheckStackVars@8 (047122Bh)  
004722EA  pop         eax  
004722EB  pop         edx  
004722EC  pop         edi  
004722ED  pop         esi  
004722EE  pop         ebx  
004722EF  add         esp,0CCh  
004722F5  cmp         ebp,esp  
004722F7  call        __RTC_CheckEsp (0471294h)  
004722FC  mov         esp,ebp  
004722FE  pop         ebp  
004722FF  ret  
00472300  add         dword ptr [eax],eax  
00472302  add         byte ptr [eax],al  
00472304  or          byte ptr [ebx],ah  
00472306  inc         edi  
00472307  add         al,bh  
00472309  ?? ?????? 
0047230A  ?? ?????? 

Assembly f:

int f() {
00472400  push        ebp  
00472401  mov         ebp,esp  
00472403  sub         esp,0C0h  
00472409  push        ebx  
0047240A  push        esi  
0047240B  push        edi  
0047240C  mov         edi,ebp  
0047240E  xor         ecx,ecx  
00472410  mov         eax,0CCCCCCCCh  
00472415  rep stos    dword ptr es:[edi]  
00472417  mov         ecx,offset _666773A0_main@cpp (047E068h)  
0047241C  call        @__CheckForDebuggerJustMyCode@4 (0471389h)  
00472421  nop  
    std::cout << "Hello";
00472422  push        offset string "Hello" (0479B30h)  
00472427  mov         eax,dword ptr [__imp_std::cout (047D0C8h)]  
0047242C  push        eax  
0047242D  call        std::operator<<<std::char_traits<char> > (04711A9h)  
00472432  add         esp,8  
    return 2;
00472435  mov         eax,2  
}
0047243A  pop         edi  
0047243B  pop         esi  
0047243C  pop         ebx  
0047243D  add         esp,0C0h  
00472443  cmp         ebp,esp  
00472445  call        __RTC_CheckEsp (0471294h)  
0047244A  mov         esp,ebp  
0047244C  pop         ebp  
0047244D  ret  

If I need to provide more code I can


Solution

  • Update after disassembly was posted: Jester commented with the answer:

    Once f tries to return, it will pop off the next item from the stack and jump there. Looking at the assembly code; that will be the result of the push edi at 004724EB in main.

    No telling what edi contains at that point, but it sounds like you got lucky and it was a valid code address that eventually happened to invoke b again.

    Like I said below, normally we'd expect it to just crash since it's not common to have another valid code address on the stack right above a return address. But in this case we do, probably since main's caller has to deal with addresses and probably has them in registers, and MSVC debug-mode's main uses EDI for rep stosd to poison some stack memory so it has to save/restore it.


    Original answer, written before the question had any necessary details

    What compiler with what options, targeting what ISA?
    (Updated re: your edit: that's 32-bit x86. From mov eax,0CCCCCCCCh / rep stosd, that's MSVC in a debug build, poisoning stack memory so uninitialized variables have a recognizable bit-pattern. And MSVC pads between functions with int3 instructions, not NOPs like GCC/Clang use, so fall-through between functions is less plausible unless one happened to be a multiple of 16 machine-code bytes.)


    This code has undefined behaviour that's visible at compile time, so there's no guarantee the compiler compiled it to asm that actually stores to an array on the stack and returns.

    For some kinds of UB, some compilers (notably Clang and sometimes GCC) assume that path of execution is unreachable and stop emitting instructions for it, including omitting the ret at the end of a function so execution falls into whatever comes next in the binary. Especially with optimization enabled, although this UB is visible even without optimization.


    If it does compile as-written, you're overwriting something on the stack with a function address. If the thing at m[3] is the function's return address, then you'll return to f instead of the call-site that originally set the return address.

    So things are already weird, and I'd normally expect it to crash when f tries to return. IDK why there'd be another valid return address right above the one b popped to get to f.


    The code only works on a 32-bit system

    32-bit x86 I assume? There are other 32-bit ISAs, including ARM and RISC-V being widely available on hobby boards.

    On x86-64 at least, the stack pointer will be misaligned on entry to f: RSP % 16 == 0 before a call is required in x86-64 calling conventions, so RSP % 16 == 8 on function entry after the call instruction pushes an 8-byte return address. On many other ISAs; call / bl / jal just puts the return address in a register (the "link register"), so stack alignment is the same on function entry as at a call-site within another function.

    And cout<< functions might well do something that relies on 16-byte RSP alignment, like using movaps to copy 16 bytes to and/or from a stack variable. Glibc printf and scanf do.

    But of course, x86-64 uses 8-byte pointers and int is only 32-bit, so you're only overwriting part of the return address at best, even with a different offset into m[]. This ROP demo could still work in a Linux non-PIE executable (gcc -fno-pie -no-pie -fno-stack-protector) since static addresses (including code) will be in the low 31 bits of virtual address-space.