assemblylinkercompiler-construction

From Object Code To executable


I would like to know what happen to an object code when we use the linker to get an executable version of it.

I presume that the linker job is not the same for Linux nor window, I am on Linux.


Solution

  • Object code is lacking information about the big picture. It contains executable code for functions, but all references to other, external functions, as well as to global data, cannot be part of the actual instructions, since their addresses are not known. So instead all those references are left blank (e.g. just filled with zero bytes in the object code) and annotated with a symbol name.

    It's the linker's job to look at all the missing symbol names and match them up against all the exported names (i.e. functions and global data provided by the object files), then find a permanent location for each datum, and finally rewrite all the code to replace the zero bytes with the actual addresses at which the data (functions and global variables) are ultimately stored.


    For example, consider this piece of C code:

    extern int a;
    extern int bar(int);     // "extern" is redundant here
    static int zip(int);
    
    int foo(int x, int y)
    {
        return 2 * x + 3 * y + zip(x - y) + a * bar(x + y);
    }
    
    int zip(int n)
    {
        return 2 * (n + 1) - (n - 1) / 2;
    }
    

    This code exports one symbol, foo, which it provides to anyone who links in this translation unit. It also has two missing symbols, a and bar. In the code implementing foo, the references to a and bar are left blank and can only be filled in by the linker when the linker knows where those actual data reside.

    Here's the machine code generated for x86 by GCC with -O3:

    0000000000000000 <foo>:
       0:   89 f9                   mov    ecx,edi
       2:   8d 04 76                lea    eax,[rsi+rsi*2]
       5:   53                      push   rbx
       6:   29 f1                   sub    ecx,esi
       8:   8d 51 ff                lea    edx,[rcx-0x1]
       b:   8d 1c 78                lea    ebx,[rax+rdi*2]
       e:   01 f7                   add    edi,esi
      10:   89 d0                   mov    eax,edx
      12:   c1 e8 1f                shr    eax,0x1f
      15:   01 c2                   add    edx,eax
      17:   d1 fa                   sar    edx,1
      19:   f7 da                   neg    edx
      1b:   8d 44 4a 02             lea    eax,[rdx+rcx*2+0x2]
      1f:   01 c3                   add    ebx,eax
      21:   e8 00 00 00 00          call   26 <foo+0x26>
      22:                                  R_X86_64_PC32       bar-0x4
      26:   0f af 05 00 00 00 00    imul   eax,DWORD PTR [rip+0x0]        # 2d <foo+0x2d>
      29:                                  R_X86_64_PC32       a-0x4
      2d:   01 d8                   add    eax,ebx
      2f:   5b                      pop    rbx
      30:   c3                      ret    
    

    Note the bytes 22 and 29: The operands are left at zero, but there is an annotation telling the linker the name of the symbol to be filled in.