cassemblyprogram-entry-pointlanguage-specifications

Which mechanism knows the entry point of a program is main()


How does an application program know its entry point is the main() function?

I know an application doesn't know its entry point is main() -- it is directed to main() function by means of the language specification whatever it is.

At that point, where is the specification actually declared? For example in C, entry point shall be main() function. Who provides this mechanism to the program? An operating system or compiler?

I came to the question after disassembling a canonical simple "Hello World" example in Visual Studio.

In this code there are only a few lines and a function main().

But after disassembling it, there are lots of definitions and macro in the memory space and main() is not the only declaration and definiton.

Here below disassembling part's screenshot. I also know there is a strict rule in language definition which is only one main() function must be defined and exist.

To summarize my question: I wonder which mechanism directs or sets main() function as an entry point of an application program.

enter image description here


Solution

  • The application does not know that main() is the entry point. Firstly, we assume C not C++ here despite your picture.

    For C the "C" entry point is main(). But you cant just start execution there as we have assumptions, more than that, rules, in C that for example .data needs to be initialized and .bss zeroed.

    unsigned x = 1;
    unsigned int y;
    

    We expect that when main() is hit that x=1. and most folks assume and perhaps it is specified that y = 0 at that time, I wouldn't make that assumption, but anyway.

    We also need a stack pointer and need to deal with argc/argv. If C++ then other stuff has to be done. Even for C depending.

    The APPLICATION does not generally know any of this. You are likely working with a C library and that library is/should be responsible for bootstrap code that preceeds main() as well as a linker script for the linker as bootstrap and linker script are intimately related. And one could argue based on some implementations that the C library is separable from the toolchain as we know with gnu you can choose from different ones and those have different bootstraps and linker scripts. But I am sure there are many that are intimately related, there is also a relationship of the library and the operating system as so many C library calls end up in one or countless system calls.

    You design an operating system, part of the design of the operating system assuming it supports runtime loadable applications is a file format that the operating systems loader supports, features that the operating system loader wants to support and how they overlap with the file format, not uncommon for the OS to define the file format, but with elf and others (not accidentally/independently created no doubt) you have opportunities for a new OS to use an existing container like elf. The OS design and its loader determines a lot of things, and the C library that mates up with all of that has to follow all of those rules, if integrated into the compiler then the compiler has to play along as well.

    It is not the application that knows it is part of the system design and the application is simply a slave to all of that, when you compile on that platform for that platform all of these rules and relationships are in play, you are putting in a very small part of the puzzle, the rest is already in place, what file formats are supported, per format what information is required, what rules are required that the compiler/library solution must provide. The system design dictates if .data and .bss are zeroed by the loader or by the application and what I mean by that is by the bootstrap not the user's portion of the program, you cant bootstrap C in C because that C would need a bootstrap and if that bootstrap were in C that C would need a bootstrap and so on.

    int main ( void )
    {
        return 0;  
    }
    

    there are a lot of things going on in the background when you compile that program not just the few instructions that might be needed to implement that code.

    compile that program on windows and Linux and mac and different versions of each with different compilers for each or C libraries, and different versions of each, etc. And what you should expect to see is perhaps even if the same target ISA, same computer even, some percentage of the combinations MIGHT choose the same few instructions for the function, what is wrapped around it is expected to be maybe similar but not the same. Would be no reason to be surprised if some of the implementations are very different from each other.

    And this is all for full blown operating systems that load programs into ram and run them, for embedded things don't be surprised if the differences are even bigger. Within a full blown os you would expect to see an mmu and the application gets a perhaps zero based address space for .text, .data, .bss at a minimum so all the solutions might have a favorite place or favorite number of sections in the same order in the binary but the size of each may be specific to the implementation. The order/size might vary by C library version or compiler version, etc.

    The magic is in the system design. and that is not magic, that is design. main() cannot be entered directly and still have various parts of the language still work like .data and .bss init, stack pointer can be solved before the entry but how and where .data and .bss are is application specific so cant be handled by a simple branch to main from the OS.

    The linker for your toolchain can be told in various ways where the entry point is it could be assumed/dictated for that tool/target or a command line option or a linker script option, or some special symbol you put on a label or whatever the designers choose. main is assumed to be the C entry point, although that doesn't actually mean it is there might be some C code that precedes it but in general there is some amount of asm (cant bootstrap C with C) and then one or more steps to main().