I write empty programs to annoy the hell out of stackoverflow coders, NOT. I am just exploring the gnu toolchain.
Now the following might be too deep for me, but to continuie the empty program saga I have started to examine the output of the C compiler, the stuff GNU as consumes.
gcc version 4.4.0 (TDM-1 mingw32)
test.c:
int main()
{
return 0;
}
gcc -S test.c
.file "test.c"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
movl $0, %eax
leave
ret
Can you explain what happens here? Here is my effort to understand it. I have used the as
manual and my minimal x86 ASM knowledge:
.file "test.c"
is the directive for the logical filename..def
: according to the docs "Begin defining debugging information for a symbol name". What is a symbol (a function name/variable?) and what kind of debugging information?.scl
: docs say "Storage class may flag whether a symbol is static or external". Is this the same static and external I know from C? And what is that '2'?.type
: stores the parameter "as the type attribute of a symbol table entry", I have no clue..endef
: no problem..text
: Now this is problematic, it seems to be something called section and I have read that its the place for code, but the docs didn't tell me too much..globl
"makes the symbol visible to ld.", the manual is quite clear on this._main:
This might be the starting address (?) for my main functionpushl_
: A long (32bit) push, which places EBP on the stackmovl
: 32-bit move. Pseudo-C: EBP = ESP;
andl
: Logical AND. Pseudo-C: ESP = -16 & ESP
, I don't really see whats the point of this.call
: Pushes the IP to the stack (so the called procedure can find its way back) and continues where __main
is. (what is __main?)movl
: this zero must be the constant I return at the end of my code. The MOV places this zero into EAX.leave
: restores stack after an ENTER instruction (?). Why?ret
: goes back to the instruction address that is saved on the stackThank you for your help!
.file "test.c"
Commands starting with . are directives to the assembler. This just says this is "file.c", that information can be exported to the debugging information of the exe.
.def ___main; .scl 2; .type 32; .endef
.def directives defines a debugging symbol. scl 2 means storage class 2(external storage class) .type 32 says this sumbol is a function. These numbers will be defined by the pe-coff exe-format
___main is a function called that takes care of bootstrapping that gcc needs(it'll do things like run c++ static initializers and other housekeeping needed).
.text
Begins a text section - code lives here.
.globl _main
defines the _main symbol as global, which will make it visible to the linker and to other modules that's linked in.
.def _main; .scl 2; .type 32; .endef
Same thing as _main , creates debugging symbols stating that _main is a function. This can be used by debuggers.
_main:
Starts a new label(It'll end up an address). the .globl directive above makes this address visible to other entities.
pushl %ebp
Saves the old frame pointer(ebp register) on the stack (so it can be put back in place when this function ends)
movl %esp, %ebp
Moves the stack pointer to the ebp register. ebp is often called the frame pointer, it points at the top of the stack values within the current "frame"(function usually), (referring to variables on the stack via ebp can help debuggers)
andl $-16, %esp
Ands the stack with fffffff0 which effectivly aligns it on a 16 byte boundary. Access to aligned values on the stack are much faster than if they were unaligned. All these preceding instructions are pretty much a standard function prologue.
call ___main
Calls the ___main function which will do initializing stuff that gcc needs. Call will push the current instruction pointer on the stack and jump to the address of ___main
movl $0, %eax
move 0 to the eax register,(the 0 in return 0;) the eax register is used to hold function return values for the stdcall calling convention.
leave
The leave instruction is pretty much shorthand for
movl ebp,esp popl ebp
i.e. it "undos" the stuff done at the start of the function - restoring the frame pointer and stack to its former state.
ret
Returns to whoever called this function. It'll pop the instruction pointer from the stack (which a corresponding call instruction will have placed there) and jump there.