assemblyx86linkerbootloaderlinker-scripts

Role of linker scripts when writing x86 assembly


I am learning x86 assembly for my own curiosity to understand low-level stuff and came across this great repository herethat contains lots of examples that can be run from EFI shell.

When I check this hello world example, there is a linker script with these contents:

ENTRY(mystart)
SECTIONS
{
  . = 0x7c00;
  .text : {
    entry.o(.text)
    *(.text)
    *(.data)
    *(.rodata)
    __bss_start = .;
    /* COMMON vs BSS: https://stackoverflow.com/questions/16835716/bss-vs-common-what-goes-where */
    *(.bss)
    *(COMMON)
    __bss_end = .;
  }
  /* https://stackoverflow.com/questions/53584666/why-does-gnu-ld-include-a-section-that-does-not-appear-in-the-linker-script */
  .sig : AT(ADDR(.text) + 512 - 2)
  {
      SHORT(0xaa55);
  }
  /DISCARD/ : {
    *(.eh_frame)
  }
  __stack_bottom = .;
  . = . + 0x1000;
  __stack_top = .;
}

I am not able to understand why its exactly required? Just to specify the load address? My general understanding about linker scripts was that they are more useful when there are more than one object files, and the linker scripts can be used to define, how sections from multiple object files can be combined into the single executable.

What if I don't specify the linker script in this example? (there are definitely at least 2 object files - one resulting from .s and one from .c)


Solution

  • Note that that is a bare-metal example, meaning no operating system.

    The gnu toolchain as installed on your computer was likely a build or built for that computer including operating system.

    So when you apt-get install build-essential then gcc hello.c -o hello, the linker script used was part of the installed toolchain and was specific to Linux, your distro. (even if you build the toolchain and libc from sources it detects the host and if not being built as a cross compiler will make the stock bootstrap and linker script for that host the default)

    When you find and install a gnu toolchain for windows the linker script buried in that install is specific to windows.

    But when you want to use a toolchain as a cross compiler in this case for bare-metal you need to link for the target environment, which usually means bring along your own linker script, this one is over-complicated as usual, but at least they provided one.

    Being x86 bare metal and using an x86 host for development you can (sometimes) use a native compiler as a cross compiler. Same for building for arm on an arm host (raspberry pi for example), etc.

    Without the linker script when building something for cross compiling the default one will be used and if you have not customized the default one for your target then you will likely get a build that won't work.

    The job of a linker script is primarily to define the address space to the linker. I want .text at this address I want .data at this address and so on. You can do this with the command line and without a linker script but it becomes simpler the more complicated you want to get and gnu ld has some issues (bugs) with command line vs linker script. Then the secondary reason is for specific languages you have a bootstrap, and some language assumptions need to be met in the bootstrap, but to facilitate that you need the address space portion of the linkers job in order to facilitate the linker script. You are letting the linker/tools do the work for you.

    So for C it is assumed that .bss is zeroed and .data is filled with the items you asked for before the entry point to your code (generally main(), but in bare-metal you can do whatever want and often don't want to use that function name) is called. As a labor-saving device you use a linker to place all the items where you asked, so all the text all the bss and data and rodata, etc. It patches up external connections between the functions. But now the linker knows where and how big .bss is for example, how do you communicate that to the bootstrap code? Well gnu and other toolchains provide a mechanism (gnu's solution is not expected to be portable to any other, assume all linker script languages are custom the toolchain and non-portable so you have to write new ones and a new bootstrap for each toolchain) for that. You can create variables in the linker script which the linker fills in whatever you as, starting address and ending address of .bss or you can do more math in the linker script and get starting address and size of .bss then you import that variable into the bootstrap assembly language code (cant use C that is a chicken and egg problem) and now the bootstrap can zero out .bss.

    So I call this a marriage between the bootstrap code and the linker script, which are both toolchain specific for more than one reason, assembly language is defined by the assembler not the target so no reason to assume x86 assembly language for one toolchain (this has nothing to do with Intel vs AT&T) is compatible with another toolchains assembler, second the linker script language is also not assumed to be portable across toolchains and specific to that toolchain. So you are using languages specific to the toolchain and for C as an example you have tasks that you have to perform before calling any of the compiled code. The two or more files that make up linking and bootstrap are intimately connected.

    Note that this example also has some bootstrap code included. I would look for a cleaner example real assembly vs inline, especially since there is an assembly language file in the project, the C part could have been demonstrating C instead of instead being a scripty inline assembly language thing. It does appear to link to a tutorial that explains what is going on so perhaps all of this is explained.

    A beauty of bare-metal is that you can do whatever you want, you have less rules to live by, so this author has done that. I personally don't expect .bss to be zeroed and don't use .data so my non portable portions, the linker script and bootstrap are much much less complicated. You are welcome to your own style and preferences, the beauty of bare-metal programming.