main.cpp code:
#include <iostream>
int main()
{
std::cout << "Hello World!\n";
}
Dissasembly in Ghidra:
*************************************************************
* FUNCTION
*************************************************************
int __cdecl main (int _Argc , char * * _Argv , char * * _E
assume GS_OFFSET = 0xff00000000
int EAX:4 <RETURN>
int ECX:4 _Argc
char * * RDX:8 _Argv
char * * R8:8 _Env
undefined1 Stack[-0x10]:1 local_10 XREF[1]:
140012292 (*)
undefined1 Stack[-0xd8]:1 local_d8 XREF[1]: 14001226a (*)
main XREF[1]: main:1400112cb (T) ,
main:1400112cb (j)
140012260 40 55 PUSH RBP
140012262 57 PUSH RDI
140012263 48 81 ec SUB RSP ,0xe8
e8 00 00 00
14001226a 48 8d 6c LEA RBP =>local_d8 ,[RSP + 0x20 ]
24 20
14001226f 48 8d 0d LEA _Argc ,[__6AFE2A9E_TestApplication@cpp ] = 01h
f0 0d 01 00
140012276 e8 63 f1 CALL __CheckForDebuggerJustMyCode void __CheckForDebuggerJustMyCod
ff ff
14001227b 90 NOP
14001227c 48 8d 15 LEA _Argv ,[s_Hello_World!_ ] = "Hello World!\n"
a5 89 00 00
140012283 48 8b 0d MOV _Argc ,qword ptr [->MSVCP140D.DLL::std::cout ] = 00021d22
0e ef 00 00
14001228a e8 f8 ed CALL std::operator<<<> basic_ostream<char,std::char_tra
ff ff
14001228f 90 NOP
140012290 33 c0 XOR EAX ,EAX
140012292 48 8d a5 LEA RSP =>local_10 ,[RBP + 0xc8 ]
c8 00 00 00
140012299 5f POP RDI
14001229a 5d POP RBP
14001229b c3 RET
Trying to understand this very simple program I made to practice disassembly in Ghidra.
The prologue makes sense to me. The previous RBP
is pushed onto the stack, RSP
is subtracted by 0xE8
(232 bytes) to allocate the current stack frame.
The address RSP+0x20
is stored in RBP+0xD8
on the stack but then things stop making sense to me...
An address is loaded into _Argc
... The address of "Hello, World!\n"
is loaded into _Argv
which is a char**
so a pointer to character arrays... then a quadword pointer (8 bytes), since this is a 64bit application, is loaded into _Argc
(argument count) which is an integer or 4 byte variable then (more or less) std::cout
is called.
Why does it seem as though local variables _Argc
and _Argv
were repurposed here to hold other local variables, one of which, is a different and larger data type? Am I completely misreading this? I can't really wrap my head around it and my only hypothesis is that it's some compiler optimization magic.
It would make sense to me, just spit balling, that std::cout
depending on it's calling convention expects a character array in RDX
(_Argv
) that holds strings to be outputted. But also uses ECX
(_Argc
) to hold the number of strings to be outputted?
I may be entirely wrong. Just trying to wrap my head around the assembly generated by Visual Studio 2022. Any help or insight would be much appreciated. My attempts at Googling were unfruitful.
Could it also just be happen stance that the disassembler is using _Argc
and _Argv
in place of ECX
and RDX
because those registers were used to pass those respective arguments to the current function? Thanks for any help.
That's bad misleading disassembly.
RCX and RDX are the first 2 arg-passing registers in Windows x64 so the args for any function call have to go there. (Unless the args are floating-point).
MOV _Argc ,qword ptr [->MSVCP140D.DLL::std::cout ]
is actually MOV RCX, ...
not ECX, so calling it _Argc
is even more misleading.
The first byte of the machine code is 48
(hex) so the high bit of the low nibble is set. That's the W(idth) bit, meaning 64-bit operand-size, so it's writing RCX not just ECX. Also we can see that from the qword ptr
in the memory source operand.
It's normal for compilers to use the incoming arg register for different things over the life of a function, so it's very unhelpful to replace those register names with _Argc
and _Argv
in general.
For example, a variable-count shift would need to use CL for the count; if you had int main(int argc, char *argv[]){ return argc << atoi(argv[1]); }
you'd see it move the atoi return value into ECX, reload or copy argc
from wherever it had stashed it across the call to atoi
, and run shl eax, cl
. (https://godbolt.org/z/4MnfPG6n6)