I would like to know what happen to an object code when we use the linker to get an executable version of it.
I presume that the linker job is not the same for Linux nor window, I am on Linux.
Object code is lacking information about the big picture. It contains executable code for functions, but all references to other, external functions, as well as to global data, cannot be part of the actual instructions, since their addresses are not known. So instead all those references are left blank (e.g. just filled with zero bytes in the object code) and annotated with a symbol name.
It's the linker's job to look at all the missing symbol names and match them up against all the exported names (i.e. functions and global data provided by the object files), then find a permanent location for each datum, and finally rewrite all the code to replace the zero bytes with the actual addresses at which the data (functions and global variables) are ultimately stored.
For example, consider this piece of C code:
extern int a;
extern int bar(int); // "extern" is redundant here
static int zip(int);
int foo(int x, int y)
{
return 2 * x + 3 * y + zip(x - y) + a * bar(x + y);
}
int zip(int n)
{
return 2 * (n + 1) - (n - 1) / 2;
}
This code exports one symbol, foo
, which it provides to anyone who links in this translation unit. It also has two missing symbols, a
and bar
. In the code implementing foo
, the references to a
and bar
are left blank and can only be filled in by the linker when the linker knows where those actual data reside.
Here's the machine code generated for x86 by GCC with -O3
:
0000000000000000 <foo>:
0: 89 f9 mov ecx,edi
2: 8d 04 76 lea eax,[rsi+rsi*2]
5: 53 push rbx
6: 29 f1 sub ecx,esi
8: 8d 51 ff lea edx,[rcx-0x1]
b: 8d 1c 78 lea ebx,[rax+rdi*2]
e: 01 f7 add edi,esi
10: 89 d0 mov eax,edx
12: c1 e8 1f shr eax,0x1f
15: 01 c2 add edx,eax
17: d1 fa sar edx,1
19: f7 da neg edx
1b: 8d 44 4a 02 lea eax,[rdx+rcx*2+0x2]
1f: 01 c3 add ebx,eax
21: e8 00 00 00 00 call 26 <foo+0x26>
22: R_X86_64_PC32 bar-0x4
26: 0f af 05 00 00 00 00 imul eax,DWORD PTR [rip+0x0] # 2d <foo+0x2d>
29: R_X86_64_PC32 a-0x4
2d: 01 d8 add eax,ebx
2f: 5b pop rbx
30: c3 ret
Note the bytes 22 and 29: The operands are left at zero, but there is an annotation telling the linker the name of the symbol to be filled in.