c++assemblyx86-64disassemblyghidra

Why does it seem as though the compiler is repurposing Argc and Argv in my function?


main.cpp code:

#include <iostream>

int main()
{
    std::cout << "Hello World!\n";
}

Dissasembly in Ghidra:

                     *************************************************************
                     *                           FUNCTION                          
                     *************************************************************
                     int  __cdecl  main (int  _Argc , char * *  _Argv , char * *  _E
                       assume GS_OFFSET = 0xff00000000
     int               EAX:4          <RETURN>
     int               ECX:4          _Argc
     char * *          RDX:8          _Argv
     char * *          R8:8           _Env
     undefined1        Stack[-0x10]:1 local_10                                XREF[1]: 

    140012292 (*)   
         undefined1        Stack[-0xd8]:1 local_d8                                XREF[1]:     14001226a (*)   
                         main                                            XREF[1]:     main:1400112cb (T) , 
                                                                                      main:1400112cb (j)   
   140012260 40  55           PUSH       RBP
   140012262 57              PUSH       RDI
   140012263 48  81  ec       SUB        RSP ,0xe8
             e8  00  00  00
   14001226a 48  8d  6c       LEA        RBP =>local_d8 ,[RSP  + 0x20 ]
             24  20
   14001226f 48  8d  0d       LEA        _Argc ,[__6AFE2A9E_TestApplication@cpp ]          = 01h
             f0  0d  01  00
   140012276 e8  63  f1       CALL       __CheckForDebuggerJustMyCode                     void __CheckForDebuggerJustMyCod
             ff  ff
   14001227b 90              NOP
   14001227c 48  8d  15       LEA        _Argv ,[s_Hello_World!_ ]                         = "Hello World!\n"
             a5  89  00  00
   140012283 48  8b  0d       MOV        _Argc ,qword ptr [->MSVCP140D.DLL::std::cout ]    = 00021d22
             0e  ef  00  00
   14001228a e8  f8  ed       CALL       std::operator<<<>                                basic_ostream<char,std::char_tra
             ff  ff
   14001228f 90              NOP
   140012290 33  c0           XOR        EAX ,EAX
   140012292 48  8d  a5       LEA        RSP =>local_10 ,[RBP  + 0xc8 ]
             c8  00  00  00
   140012299 5f              POP        RDI
   14001229a 5d              POP        RBP
   14001229b c3              RET

Trying to understand this very simple program I made to practice disassembly in Ghidra.

The prologue makes sense to me. The previous RBP is pushed onto the stack, RSP is subtracted by 0xE8 (232 bytes) to allocate the current stack frame.

The address RSP+0x20 is stored in RBP+0xD8 on the stack but then things stop making sense to me...

An address is loaded into _Argc... The address of "Hello, World!\n" is loaded into _Argv which is a char** so a pointer to character arrays... then a quadword pointer (8 bytes), since this is a 64bit application, is loaded into _Argc (argument count) which is an integer or 4 byte variable then (more or less) std::cout is called.

Why does it seem as though local variables _Argc and _Argv were repurposed here to hold other local variables, one of which, is a different and larger data type? Am I completely misreading this? I can't really wrap my head around it and my only hypothesis is that it's some compiler optimization magic.

It would make sense to me, just spit balling, that std::cout depending on it's calling convention expects a character array in RDX (_Argv) that holds strings to be outputted. But also uses ECX (_Argc) to hold the number of strings to be outputted?

I may be entirely wrong. Just trying to wrap my head around the assembly generated by Visual Studio 2022. Any help or insight would be much appreciated. My attempts at Googling were unfruitful.

Could it also just be happen stance that the disassembler is using _Argc and _Argv in place of ECX and RDX because those registers were used to pass those respective arguments to the current function? Thanks for any help.


Solution

  • That's bad misleading disassembly.

    RCX and RDX are the first 2 arg-passing registers in Windows x64 so the args for any function call have to go there. (Unless the args are floating-point).

    MOV _Argc ,qword ptr [->MSVCP140D.DLL::std::cout ]
    is actually MOV RCX, ... not ECX, so calling it _Argc is even more misleading.

    The first byte of the machine code is 48(hex) so the high bit of the low nibble is set. That's the W(idth) bit, meaning 64-bit operand-size, so it's writing RCX not just ECX. Also we can see that from the qword ptr in the memory source operand.

    It's normal for compilers to use the incoming arg register for different things over the life of a function, so it's very unhelpful to replace those register names with _Argc and _Argv in general.

    For example, a variable-count shift would need to use CL for the count; if you had int main(int argc, char *argv[]){ return argc << atoi(argv[1]); } you'd see it move the atoi return value into ECX, reload or copy argc from wherever it had stashed it across the call to atoi, and run shl eax, cl. (https://godbolt.org/z/4MnfPG6n6)