I am trying to learn assembly x86_64 by reverse engineering programs to see how assembly works. I do this by disassembling my C code into assembly with the gcc compiler. The command I use for this is
gcc -O0 -S my_C_code.c
However, in doing so the compiler produces a mixture of 32 bit and 64 bit instructions. This confuses me as I thought the compiler would only produce 64 bit instructions. An example of this is :
subq $32 , %rsp
movl %edi , -20(%rbp)
movl %esi , -24(%rbp)
movq %rdx , -32(%rbp)
The compiler produces the first instruction using a 64 bit instruction but uses 32 bit for the next 2. Other parts of the program are similar. For context , the code snippet above was taken from a subroutine that takes 3 arguments : an integer , an integer and a pointer respectively.
From what I've read so far , I think that this happens because 32 bit registers are used for integers and 64 bit for pointers. This raises another question. I am able to successfully change the %edi register and everything related to it to 64 bit. However, in doing so for the %esi register , I face a segmentation fault. Could this be due to 16 byte stack alignment ?
I think that this happens because 32 bit registers are used for integers and 64 bit for pointers.
Probably your C code uses int
. I think the compiler recognized int
as 4-bytes therefore 32-bit register was used. If you use something like int64_t
, 64-bit registers would be used.
I am able to successfully change the %edi register and everything related to it to 64 bit. However, in doing so for the %esi register , I face a segmentation fault.
Remember that esi
/edi
is 4-bytes but rsi
/rdi
is 8-bytes.
The original state is
+--------+-----+
| rbp-24 | esi |
+--------+-----+
| rbp-20 | edi |
+--------+-----+
If you change the edi
to the rdi
, then
+--------+------------------------+
| rbp-24 | esi |
+--------+------------------------+
| rbp-20 | High-bytes of rdi |
+--------+------------------------+
| rbp-16 | Low-bytes of rdi (edi) |
+--------+------------------------+
If you change the esi
to the rsi
, then
+--------+------------------------+-----+
| rbp-24 | High-bytes of rsi | |
+--------+------------------------+-----+
| rbp-20 | Low-bytes of rsi (esi) | edi |
+--------+------------------------+-----+
You'll notice rsi
contains the edi
value.
I don't know your C code so it's a guess, but probably a function call. Perhaps your code crashes if the second argument (which is rsi
on System V AMD64 ABI) is an unexpected value.