#include <iostream>
int addition(int a, int b) {
int funcA = a;
int funcB = b;
return funcA + funcB;
};
int main() {
volatile int count1 = 8;
volatile int count2 = 9;
volatile int anotherRandomVar = 0;
volatile int result = addition(count1, count2);
std::printf("main func is here");
return 0;
}
**************************************************************
* FUNCTION *
**************************************************************
int __cdecl main(int _Argc, char * * _Argv, char * * _Env)
assume GS_OFFSET = 0xff00000000
int EAX:4 <RETURN>
int ECX:4 _Argc
char * * RDX:8 _Argv
char * * R8:8 _Env
undefined4 Stack[0x18]:4 local_res18 XREF[1]: 14000108b(W)
undefined4 Stack[0x10]:4 local_res10 XREF[2]: 140001074(W),
140001097(R)
undefined4 Stack[0x8]:4 local_res8 XREF[2]: 140001083(W),
140001093(R)
main XREF[2]: __scrt_common_main_seh:1400012cb
14000400c(*)
140001070 48 83 ec 28 SUB RSP,0x28
140001074 c7 44 24 MOV dword ptr [RSP + local_res10],0x8
38 08 00
00 00
14000107c 48 8d 0d LEA _Argc,[s_main_func_is_here] = "main func is here"
cd 11 00 00
140001083 c7 44 24 MOV dword ptr [RSP + local_res8],0x9
30 09 00
00 00
14000108b c7 44 24 MOV dword ptr [RSP + local_res18],0x0
40 00 00
00 00
140001093 8b 44 24 30 MOV EAX,dword ptr [RSP + local_res8]
140001097 8b 44 24 38 MOV EAX,dword ptr [RSP + local_res10]
14000109b e8 70 ff CALL printf int printf(char * _Format, ...)
ff ff
1400010a0 33 c0 XOR EAX,EAX
1400010a2 48 83 c4 28 ADD RSP,0x28
1400010a6 c3 RET
This was the original ASM output, I've assigned the correct types and named them correctly and now have
**************************************************************
* FUNCTION *
**************************************************************
int __cdecl main(int _Argc, char * * _Argv, char * * _Env)
assume GS_OFFSET = 0xff00000000
int EAX:4 <RETURN>
int ECX:4 _Argc
char * * RDX:8 _Argv
char * * R8:8 _Env
int Stack[0x18]:4 anotherRandomVar XREF[1]: 14000108b(W)
int Stack[0x10]:4 count1 XREF[2]: 140001074(W),
140001097(R)
int Stack[0x8]:4 count2 XREF[2]: 140001083(W),
140001093(R)
main XREF[2]: __scrt_common_main_seh:1400012cb
14000400c(*)
140001070 48 83 ec 28 SUB RSP,0x28
140001074 c7 44 24 MOV dword ptr [RSP + count1],0x8
38 08 00
00 00
14000107c 48 8d 0d LEA _Argc,[s_main_func_is_here] = "main func is here"
cd 11 00 00
140001083 c7 44 24 MOV dword ptr [RSP + count2],0x9
30 09 00
00 00
14000108b c7 44 24 MOV dword ptr [RSP + anotherRandomVar],0x0
40 00 00
00 00
140001093 8b 44 24 30 MOV EAX,dword ptr [RSP + count2]
140001097 8b 44 24 38 MOV EAX,dword ptr [RSP + count1]
14000109b e8 70 ff CALL printf int printf(char * _Format, ...)
ff ff
1400010a0 33 c0 XOR EAX,EAX
1400010a2 48 83 c4 28 ADD RSP,0x28
1400010a6 c3 RET
Once I correctly applied a type it gave me this from the decompiler, so I basically take it the decompiler is not very helpful?
int __cdecl main(int _Argc,char **_Argv,char **_Env)
{
int count2;
int count1;
int anotherRandomVar;
printf("main func is here");
return 0;
}
as the decompiled code was empty before I applied types.
My main question is about these lines:
140001093 8b 44 24 30 MOV EAX,dword ptr [RSP + count2]
140001097 8b 44 24 38 MOV EAX,dword ptr [RSP + count1]
This is where the two values count1 and count2 are passed into the addition function. Basically I'm wondering about when they are passed into EAX, what happens? My initial idea is that the function is somehow called after this and looks at the top 2 entries in the EAX register? Basically it seems that the order in which things are done is very important i.e. when these two values are moved into this register, they are consumed right after, or maybe I'm wrongly confusing EAX with how the stack operates instead? I'm assuming it is just optimized here as the function just does an addition, but I want to see where this addition takes place.
Where is the result volatile int result = addition(count1, count2);
stored?
edit 1:
I compiled and built the EXE in x86 release mode instead of x64 and the ASM seems to be correct now:
**************************************************************
* FUNCTION *
**************************************************************
int __cdecl main(int _Argc, char * * _Argv, char * * _Env)
assume FS_OFFSET = 0xffdff000
int EAX:4 <RETURN>
int Stack[0x4]:4 _Argc
char * * Stack[0x8]:4 _Argv
char * * Stack[0xc]:4 _Env
undefined4 Stack[-0x8]:4 local_8 XREF[2]: 0040104d(W),
0040105b(R)
undefined4 Stack[-0xc]:4 local_c XREF[2]: 00401046(W),
0040105e(R)
undefined4 Stack[-0x10]:4 local_10 XREF[2]: 00401054(W),
00401068(W)
_main XREF[1]: __scrt_common_main_seh:00401241(
main
00401040 55 PUSH EBP
00401041 8b ec MOV EBP,ESP
00401043 83 ec 0c SUB ESP,0xc
00401046 c7 45 f8 MOV dword ptr [EBP + local_c],0x8
08 00 00 00
0040104d c7 45 fc MOV dword ptr [EBP + local_8],0x9
09 00 00 00
00401054 c7 45 f4 MOV dword ptr [EBP + local_10],0x0
00 00 00 00
0040105b 8b 4d fc MOV ECX,dword ptr [EBP + local_8]
0040105e 8b 45 f8 MOV EAX,dword ptr [EBP + local_c]
00401061 03 c1 ADD EAX,ECX
00401063 68 00 21 PUSH s_main_func_is_here = "main func is here"
40 00
00401068 89 45 f4 MOV dword ptr [EBP + local_10],EAX
0040106b e8 a0 ff CALL printf int printf(char * _Format, ...)
ff ff
00401070 83 c4 04 ADD ESP,0x4
00401073 33 c0 XOR EAX,EAX
00401075 8b e5 MOV ESP,EBP
00401077 5d POP EBP
00401078 c3 RET
edit 2:
I also seem to have forgotten to recompile when int result
was changed to volatile, so the disassembly is now:
**************************************************************
* FUNCTION *
**************************************************************
int __cdecl main(int _Argc, char * * _Argv, char * * _Env)
assume GS_OFFSET = 0xff00000000
int EAX:4 <RETURN>
int ECX:4 _Argc
char * * RDX:8 _Argv
char * * R8:8 _Env
int Stack[0x18]:4 local_res18 XREF[1]: 140001084(W)
int Stack[0x10]:4 local_res10 XREF[2]: 140001074(W),
140001090(R)
int Stack[0x8]:4 local_res8 XREF[3]: 14000107c(W),
14000108c(R),
140001096(W)
main XREF[2]: __scrt_common_main_seh:1400012cb
14000400c(*)
140001070 48 83 ec 28 SUB RSP,0x28
140001074 c7 44 24 MOV dword ptr [RSP + local_res10],0x8
38 08 00
00 00
14000107c c7 44 24 MOV dword ptr [RSP + local_res8],0x9
30 09 00
00 00
140001084 c7 44 24 MOV dword ptr [RSP + local_res18],0x0
40 00 00
00 00
14000108c 8b 4c 24 30 MOV _Argc,dword ptr [RSP + local_res8]
140001090 8b 44 24 38 MOV EAX,dword ptr [RSP + local_res10]
140001094 03 c8 ADD _Argc,EAX
140001096 89 4c 24 30 MOV dword ptr [RSP + local_res8],_Argc
14000109a 48 8d 0d LEA _Argc,[s_main_func_is_here] = "main func is here"
af 11 00 00
1400010a1 e8 6a ff CALL printf int printf(char * _Format, ...)
ff ff
1400010a6 33 c0 XOR EAX,EAX
1400010a8 48 83 c4 28 ADD RSP,0x28
1400010ac c3 RET
So given the non-volatile disassembly, when reversing, how would I figure out that some addition was taking place? It's optimised as you said, but how would I go about uncovering this? (If I was reverse engineering with no source) Maybe there isn't a straightforward answer and it's more about breaking everything down, or maybe there is something that removes these optimisations in the disassembler?
There are no instructions corresponding to the add or final assignment in volatile int result = addition(count1, count2);
in your original asm source. Probably the source you actually compiled omitted volatile
on int result
, so that declaration/statement could optimize away except for the two MOV loads from two volatile reads which don't use their results, like (void)count1; (void)count2;
.
volatile
accesses count as visible side-effects and will show up in the asm, but other stuff won't unless the compiler needs it to produce other visible results.
I'm not surprised Ghidra doesn't show initializers for the local vars, and that it doesn't show them as volatile. If you used it on debug builds, you wouldn't want every variable to be shown as volatile
, but at -O0
compilers like gcc/clang/msvc/icc all spill vars from registers between statements, similar to everything being volatile
. So you'd get false-positive identification of every local var as volatile
when decompiling a debug build. A decompiler is trying to identify computations that lead to things visible outside the function; store/reload of local variables that don't eventually get used some other way is just noise.
I was going to edit your question title to describe the actual situation you're asking about, like "Ghidra decompiler doesn't recover function call that was inlined", but I assume anyone capable of putting the question in those terms already understands that that's an answer. :/
Even if the work hadn't been optimized away, it would be indistinguishable from result = count1 + count2;
. The fact that there was a helper function which inlined is totally lost during compilation.
https://godbolt.org/z/KaGa9axs1 shows MSVC 19.35 x64 optimized code-gen for your code with and without volatile
on result
. It does add and store to result
with volatile
.
given the non volatile disassembly, when reversing, how would I figure out that some addition was taking place? It's optimised as you said, but how would I go about uncovering this? (If I was reverse engineering with no source)
You wouldn't see the addition, it doesn't exist in the compiled program because it's not part of the observable behaviour of the original C program. (C defines observable behaviour to include I/O and volatile accesses.) That's why it got optimized away at compile time. –
That makes sense, but how are things like constant values reversible then if not present in the dissasembly? Is it a case that the work using consts is done then stripped away if it won't be needed again, as in this case? And if so, see the original point of how can I reverse them / know of their existance to begin with? –
A decompiler can't tell how constants were written in the source, e.g. mmap(PROT_EXEC|PROT_WRITE|PROT_READ)
is indistinguishable from mmap(0x111)
or whatever those constants actually are. A smart decompiler that knows about some system-call bit-flags might decompose things for you, like how strace
does.
But in other cases, you'll just get stuff like malloc(12)
not malloc(3*sizeof(int))
. Again, unless the decompiler has patterns to invent a sizeof based on the type it deduces.
Compiling + decompiling loses information about program structure, obviously, not just var names and comments. All constants become hard-coded numbers wherever they're used, so there's no way to tell whether two things with the same value were a coincidence or two uses of the same named constant.