assemblyx86reverse-engineeringdecompilingghidra

x86 ASM using Ghidra, understanding decompiler results for an inlined function call


#include <iostream>

int addition(int a, int b) {
    int funcA = a;
    int funcB = b;
    return funcA + funcB;
};

int main() {
    volatile int count1 = 8;

    volatile int count2 = 9;

    volatile int anotherRandomVar = 0;


    volatile int result = addition(count1, count2);

    std::printf("main func is here");

    return 0;
}
                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             int __cdecl main(int _Argc, char * * _Argv, char * * _Env)
                               assume GS_OFFSET = 0xff00000000
             int               EAX:4          <RETURN>
             int               ECX:4          _Argc
             char * *          RDX:8          _Argv
             char * *          R8:8           _Env
             undefined4        Stack[0x18]:4  local_res18                             XREF[1]:     14000108b(W)  
             undefined4        Stack[0x10]:4  local_res10                             XREF[2]:     140001074(W), 
                                                                                                   140001097(R)  
             undefined4        Stack[0x8]:4   local_res8                              XREF[2]:     140001083(W), 
                                                                                                   140001093(R)  
                             main                                            XREF[2]:     __scrt_common_main_seh:1400012cb
                                                                                          14000400c(*)  
       140001070 48 83 ec 28     SUB        RSP,0x28
       140001074 c7 44 24        MOV        dword ptr [RSP + local_res10],0x8
                 38 08 00 
                 00 00
       14000107c 48 8d 0d        LEA        _Argc,[s_main_func_is_here]                      = "main func is here"
                 cd 11 00 00
       140001083 c7 44 24        MOV        dword ptr [RSP + local_res8],0x9
                 30 09 00 
                 00 00
       14000108b c7 44 24        MOV        dword ptr [RSP + local_res18],0x0
                 40 00 00 
                 00 00
       140001093 8b 44 24 30     MOV        EAX,dword ptr [RSP + local_res8]
       140001097 8b 44 24 38     MOV        EAX,dword ptr [RSP + local_res10]
       14000109b e8 70 ff        CALL       printf                                           int printf(char * _Format, ...)
                 ff ff
       1400010a0 33 c0           XOR        EAX,EAX
       1400010a2 48 83 c4 28     ADD        RSP,0x28
       1400010a6 c3              RET

This was the original ASM output, I've assigned the correct types and named them correctly and now have

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             int __cdecl main(int _Argc, char * * _Argv, char * * _Env)
                               assume GS_OFFSET = 0xff00000000
             int               EAX:4          <RETURN>
             int               ECX:4          _Argc
             char * *          RDX:8          _Argv
             char * *          R8:8           _Env
             int               Stack[0x18]:4  anotherRandomVar                        XREF[1]:     14000108b(W)  
             int               Stack[0x10]:4  count1                                  XREF[2]:     140001074(W), 
                                                                                                   140001097(R)  
             int               Stack[0x8]:4   count2                                  XREF[2]:     140001083(W), 
                                                                                                   140001093(R)  
                             main                                            XREF[2]:     __scrt_common_main_seh:1400012cb
                                                                                          14000400c(*)  
       140001070 48 83 ec 28     SUB        RSP,0x28
       140001074 c7 44 24        MOV        dword ptr [RSP + count1],0x8
                 38 08 00 
                 00 00
       14000107c 48 8d 0d        LEA        _Argc,[s_main_func_is_here]                      = "main func is here"
                 cd 11 00 00
       140001083 c7 44 24        MOV        dword ptr [RSP + count2],0x9
                 30 09 00 
                 00 00
       14000108b c7 44 24        MOV        dword ptr [RSP + anotherRandomVar],0x0
                 40 00 00 
                 00 00
       140001093 8b 44 24 30     MOV        EAX,dword ptr [RSP + count2]
       140001097 8b 44 24 38     MOV        EAX,dword ptr [RSP + count1]
       14000109b e8 70 ff        CALL       printf                                           int printf(char * _Format, ...)
                 ff ff
       1400010a0 33 c0           XOR        EAX,EAX
       1400010a2 48 83 c4 28     ADD        RSP,0x28
       1400010a6 c3              RET

Once I correctly applied a type it gave me this from the decompiler, so I basically take it the decompiler is not very helpful?

int __cdecl main(int _Argc,char **_Argv,char **_Env)

{
  int count2;
  int count1;
  int anotherRandomVar;
  
  printf("main func is here");
  return 0;
}

as the decompiled code was empty before I applied types.

My main question is about these lines:

       140001093 8b 44 24 30     MOV        EAX,dword ptr [RSP + count2]
       140001097 8b 44 24 38     MOV        EAX,dword ptr [RSP + count1]

This is where the two values count1 and count2 are passed into the addition function. Basically I'm wondering about when they are passed into EAX, what happens? My initial idea is that the function is somehow called after this and looks at the top 2 entries in the EAX register? Basically it seems that the order in which things are done is very important i.e. when these two values are moved into this register, they are consumed right after, or maybe I'm wrongly confusing EAX with how the stack operates instead? I'm assuming it is just optimized here as the function just does an addition, but I want to see where this addition takes place.
Where is the result volatile int result = addition(count1, count2); stored?

edit 1:

I compiled and built the EXE in x86 release mode instead of x64 and the ASM seems to be correct now:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             int __cdecl main(int _Argc, char * * _Argv, char * * _Env)
                               assume FS_OFFSET = 0xffdff000
             int               EAX:4          <RETURN>
             int               Stack[0x4]:4   _Argc
             char * *          Stack[0x8]:4   _Argv
             char * *          Stack[0xc]:4   _Env
             undefined4        Stack[-0x8]:4  local_8                                 XREF[2]:     0040104d(W), 
                                                                                                   0040105b(R)  
             undefined4        Stack[-0xc]:4  local_c                                 XREF[2]:     00401046(W), 
                                                                                                   0040105e(R)  
             undefined4        Stack[-0x10]:4 local_10                                XREF[2]:     00401054(W), 
                                                                                                   00401068(W)  
                             _main                                           XREF[1]:     __scrt_common_main_seh:00401241(
                             main
        00401040 55              PUSH       EBP
        00401041 8b ec           MOV        EBP,ESP
        00401043 83 ec 0c        SUB        ESP,0xc
        00401046 c7 45 f8        MOV        dword ptr [EBP + local_c],0x8
                 08 00 00 00
        0040104d c7 45 fc        MOV        dword ptr [EBP + local_8],0x9
                 09 00 00 00
        00401054 c7 45 f4        MOV        dword ptr [EBP + local_10],0x0
                 00 00 00 00
        0040105b 8b 4d fc        MOV        ECX,dword ptr [EBP + local_8]
        0040105e 8b 45 f8        MOV        EAX,dword ptr [EBP + local_c]
        00401061 03 c1           ADD        EAX,ECX
        00401063 68 00 21        PUSH       s_main_func_is_here                              = "main func is here"
                 40 00
        00401068 89 45 f4        MOV        dword ptr [EBP + local_10],EAX
        0040106b e8 a0 ff        CALL       printf                                           int printf(char * _Format, ...)
                 ff ff
        00401070 83 c4 04        ADD        ESP,0x4
        00401073 33 c0           XOR        EAX,EAX
        00401075 8b e5           MOV        ESP,EBP
        00401077 5d              POP        EBP
        00401078 c3              RET

edit 2:

I also seem to have forgotten to recompile when int result was changed to volatile, so the disassembly is now:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             int __cdecl main(int _Argc, char * * _Argv, char * * _Env)
                               assume GS_OFFSET = 0xff00000000
             int               EAX:4          <RETURN>
             int               ECX:4          _Argc
             char * *          RDX:8          _Argv
             char * *          R8:8           _Env
             int               Stack[0x18]:4  local_res18                             XREF[1]:     140001084(W)  
             int               Stack[0x10]:4  local_res10                             XREF[2]:     140001074(W), 
                                                                                                   140001090(R)  
             int               Stack[0x8]:4   local_res8                              XREF[3]:     14000107c(W), 
                                                                                                   14000108c(R), 
                                                                                                   140001096(W)  
                             main                                            XREF[2]:     __scrt_common_main_seh:1400012cb
                                                                                          14000400c(*)  
       140001070 48 83 ec 28     SUB        RSP,0x28
       140001074 c7 44 24        MOV        dword ptr [RSP + local_res10],0x8
                 38 08 00 
                 00 00
       14000107c c7 44 24        MOV        dword ptr [RSP + local_res8],0x9
                 30 09 00 
                 00 00
       140001084 c7 44 24        MOV        dword ptr [RSP + local_res18],0x0
                 40 00 00 
                 00 00
       14000108c 8b 4c 24 30     MOV        _Argc,dword ptr [RSP + local_res8]
       140001090 8b 44 24 38     MOV        EAX,dword ptr [RSP + local_res10]
       140001094 03 c8           ADD        _Argc,EAX
       140001096 89 4c 24 30     MOV        dword ptr [RSP + local_res8],_Argc
       14000109a 48 8d 0d        LEA        _Argc,[s_main_func_is_here]                      = "main func is here"
                 af 11 00 00
       1400010a1 e8 6a ff        CALL       printf                                           int printf(char * _Format, ...)
                 ff ff
       1400010a6 33 c0           XOR        EAX,EAX
       1400010a8 48 83 c4 28     ADD        RSP,0x28
       1400010ac c3              RET

So given the non-volatile disassembly, when reversing, how would I figure out that some addition was taking place? It's optimised as you said, but how would I go about uncovering this? (If I was reverse engineering with no source) Maybe there isn't a straightforward answer and it's more about breaking everything down, or maybe there is something that removes these optimisations in the disassembler?


Solution

  • There are no instructions corresponding to the add or final assignment in volatile int result = addition(count1, count2); in your original asm source. Probably the source you actually compiled omitted volatile on int result, so that declaration/statement could optimize away except for the two MOV loads from two volatile reads which don't use their results, like (void)count1; (void)count2;.

    volatile accesses count as visible side-effects and will show up in the asm, but other stuff won't unless the compiler needs it to produce other visible results.

    I'm not surprised Ghidra doesn't show initializers for the local vars, and that it doesn't show them as volatile. If you used it on debug builds, you wouldn't want every variable to be shown as volatile, but at -O0 compilers like gcc/clang/msvc/icc all spill vars from registers between statements, similar to everything being volatile. So you'd get false-positive identification of every local var as volatile when decompiling a debug build. A decompiler is trying to identify computations that lead to things visible outside the function; store/reload of local variables that don't eventually get used some other way is just noise.

    I was going to edit your question title to describe the actual situation you're asking about, like "Ghidra decompiler doesn't recover function call that was inlined", but I assume anyone capable of putting the question in those terms already understands that that's an answer. :/

    Even if the work hadn't been optimized away, it would be indistinguishable from result = count1 + count2;. The fact that there was a helper function which inlined is totally lost during compilation.

    https://godbolt.org/z/KaGa9axs1 shows MSVC 19.35 x64 optimized code-gen for your code with and without volatile on result. It does add and store to result with volatile.

    given the non volatile disassembly, when reversing, how would I figure out that some addition was taking place? It's optimised as you said, but how would I go about uncovering this? (If I was reverse engineering with no source)

    You wouldn't see the addition, it doesn't exist in the compiled program because it's not part of the observable behaviour of the original C program. (C defines observable behaviour to include I/O and volatile accesses.) That's why it got optimized away at compile time. –

    That makes sense, but how are things like constant values reversible then if not present in the dissasembly? Is it a case that the work using consts is done then stripped away if it won't be needed again, as in this case? And if so, see the original point of how can I reverse them / know of their existance to begin with? –

    A decompiler can't tell how constants were written in the source, e.g. mmap(PROT_EXEC|PROT_WRITE|PROT_READ) is indistinguishable from mmap(0x111) or whatever those constants actually are. A smart decompiler that knows about some system-call bit-flags might decompose things for you, like how strace does.

    But in other cases, you'll just get stuff like malloc(12) not malloc(3*sizeof(int)). Again, unless the decompiler has patterns to invent a sizeof based on the type it deduces.

    Compiling + decompiling loses information about program structure, obviously, not just var names and comments. All constants become hard-coded numbers wherever they're used, so there's no way to tell whether two things with the same value were a coincidence or two uses of the same named constant.