I understand how static/dynamic libraries are used by the linker/loader.
It's a pity the terms static library and dynamic library are both of the form ADJECTIVE library, because it perpetually leads programmers to think that they denote variants of the essentially the same kind of thing. This is almost as misleading as the thought that a badminton court and a supreme court are essentially the same kind of thing. In fact it's far more misleading, since nobody actually suffers from thinking that a badminton court and a supreme court are essentially the same kind of thing.
Can someone throw some light on the differences between the contents of static and shared library files?
Let's use examples. To push back against the badminton court / supreme court fog
I'm going to use more accurate technical terms. Instead of static library I'll say ar
archive, and instead of dynamic library I'll say
dynamic shared object, or DSO for short.
What an ar
archive is
I'll make an ar
archive starting with these three files:
foo.c
#include <stdio.h>
void foo(void)
{
puts("foo");
}
bar.c
#include <stdio.h>
void bar(void)
{
puts("bar");
}
limerick.txt
There once was a young lady named bright
Whose speed was much faster than light
She set out one day
In a relative way
And returned on the previous night.
I'll compile those two C source into Position Independent object files:
$ gcc -c -Wall -fPIC foo.c
$ gcc -c -Wall -fPIC bar.c
There's no need for object files destined for an ar
archive to be compiled with
-fPIC
. I just want these ones compiled that way.
Then I'll create an ar
archive called libsundry.a
containing the object files foo.o
and bar.o
,
plus limerick.txt
:
$ ar rcs libsundry.a foo.o bar.o limerick.txt
An ar
archive is created, of course, with ar
,
the GNU general-purpose archiver. So it is not created by the linker. No linkage
happens. Here's how ar
reports the contents of the archive:
$ ar -t libsundry.a
foo.o
bar.o
limerick.txt
Here's what the limerick in the archive looks like:
$ rm limerick.txt
$ ar x libsundry.a limerick.txt; cat limerick.txt
There once was a young lady named bright
Whose speed was much faster than light
She set out one day
In a relative way
And returned on the previous night.
Q. What's the point of putting two object files and an ASCII limerick into the same ar
archive?
A. To show that I can. To show that an ar
archive is just a bag of files.
Let's see what file
makes of libsundry.a.
$ file libsundry.a
libsundry.a: current ar archive
Now I'll write a couple of programs that use libsundry.a
in their linkage.
fooprog.c
extern void foo(void);
int main(void)
{
foo();
return 0;
}
Compile, link and run that one:
$ gcc -c -Wall fooprog.c
$ gcc -o fooprog fooprog.o -L. -lsundry
$ ./fooprog
foo
That's hunky dory. The linker apparently wasn't bothered by the presence of
an ASCII limerick in libsundry.a
.
The reason for that is the linker didn't even try to link limerick.txt
into the program. Let's do the linkage again, this time with a diagnostic option
that will show us exactly what input files are linked:
$ gcc -o fooprog fooprog.o -L. -lsundry -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o
fooprog.o
(./libsundry.a)foo.o
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/5/libgcc_s.so)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/5/libgcc_s.so)
/usr/lib/gcc/x86_64-linux-gnu/5/crtend.o
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crtn.o
Lots of default libraries and object files there, but the only object files we have created that the linker consumed are:
fooprog.o
(./libsundry.a)foo.o
All that the linker did with ./libsundry.a
was take foo.o
out of
the bag and link it in the program. After linking fooprog.o
into the program,
it needed to find a definition for foo
.
It looked in the bag. It found the definition in foo.o
, so it took foo.o
from
the bag and linked it in the program. In linking fooprog
,
gcc -o fooprog fooprog.o -L. -lsundry
is exactly the same linkage as:
$ gcc -o fooprog fooprog.o foo.o
What does file
say about fooprog
?
$ file fooprog
fooprog: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), \
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, \
for GNU/Linux 2.6.32, BuildID[sha1]=32525dce7adf18604b2eb5af7065091c9111c16e,
not stripped
Here's my second program:
foobarprog.c
extern void foo(void);
extern void bar(void);
int main(void)
{
foo();
bar();
return 0;
}
Compile, link and run:
$ gcc -c -Wall foobarprog.c
$ gcc -o foobarprog foobarprog.o -L. -lsundry
$ ./foobarprog
foo
bar
And here's the linkage again with -trace
:
$ gcc -o foobarprog foobarprog.o -L. -lsundry -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o
foobarprog.o
(./libsundry.a)foo.o
(./libsundry.a)bar.o
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/5/libgcc_s.so)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/5/libgcc_s.so)
/usr/lib/gcc/x86_64-linux-gnu/5/crtend.o
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crtn.o
So this time, our object files that the linker consumed were:
foobarprog.o
(./libsundry.a)foo.o
(./libsundry.a)bar.o
After linking foobarprog.o
into the program, it needed to find definitions for foo
and bar
.
It looked in the bag. It found definitions respectively in foo.o
and bar.o
, so it took them from
the bag and linked them in the program. In linking foobarprog
,
gcc -o foobarprog foobarprog.o -L. -lsundry
is exactly the same linkage as:
$ gcc -o foobarprog foobarprog.o foo.o bar.o
Summing all that up. An ar
archive is just a bag of files. You can use
an ar
archive to offer to the linker a bunch of object files from which to
pick the ones that it needs to continue the linkage. It will take those object files
out of the bag and link them into the output file. It has absolutely no other
use for the bag. The bag contributes nothing at all to the linkage.
The bag just makes your life a little simpler by sparing you the need to know exactly what object files you need for a particular linkage. You only need to know: Well, they're in that bag.
What a DSO is
Let's make one.
foobar.c
extern void foo(void);
extern void bar(void);
void foobar(void)
{
foo();
bar();
}
We'll compile this new source file:
$ gcc -c -Wall -fPIC foobar.c
and then make a DSO using foobar.o
and re-using libsundry.a
$ gcc -shared -o libfoobar.so foobar.o -L. -lsundry -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o
foobar.o
(./libsundry.a)foo.o
(./libsundry.a)bar.o
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/5/libgcc_s.so)
/lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/5/libgcc_s.so)
/usr/lib/gcc/x86_64-linux-gnu/5/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crtn.o
That has made the DSO libfoobar.so
. Notice: A DSO is made by the linker. It
is linked just like a program is linked. The linkage of libfoopar.so
looks very much
like the linkage of foobarprog
, but the addition of the option
-shared
instructs the linker to produce a DSO rather than a program. Here we see that our object
files consumed by the linkage were:
foobar.o
(./libsundry.a)foo.o
(./libsundry.a)bar.o
ar
does not understand a DSO at all:
$ ar -t libfoobar.so
ar: libfoobar.so: File format not recognised
But file
does:
$ file libfoobar.so
libfoobar.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), \
dynamically linked, BuildID[sha1]=16747713db620e5ef14753334fea52e71fb3c5c8, \
not stripped
Now if we relink foobarprog
using libfoobar.so
instead of libsundry.a
:
$ gcc -o foobarprog foobarprog.o -L. -lfoobar -Wl,-trace,--rpath=$(pwd)
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o
foobarprog.o
-lfoobar (./libfoobar.so)
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/5/libgcc_s.so)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/5/libgcc_s.so)
/usr/lib/gcc/x86_64-linux-gnu/5/crtend.o
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crtn.o
we see
foobarprog.o
-lfoobar (./libfoobar.so)
that ./libfoobar.so
itself was linked. Not some object files "inside it". There
aren't any object files inside it. And how this has
contributed to the linkage can be seen in the dynamic dependencies of the program:
$ ldd foobarprog
linux-vdso.so.1 => (0x00007ffca47fb000)
libfoobar.so => /home/imk/develop/so/scrap/libfoobar.so (0x00007fb050eeb000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb050afd000)
/lib64/ld-linux-x86-64.so.2 (0x000055d8119f0000)
The program has come out with runtime dependency on libfoobar.so
. That's what linking a DSO does.
We can see this runtime dependency is satisfied. So the program will run:
$ ./foobarprog
foo
bar
just the same as before.
The fact that a DSO and a program - unlike an ar
archive - are both products
of the linker suggests that a DSO and a program are variants of the essentially the same kind of thing.
The file
outputs suggested that too. A DSO and a program are both ELF binaries
that the OS loader can map into a process address space. Not just a bag of files.
An ar
archive is not an ELF binary of any kind.
The difference between a program-type ELF file and non-program-type ELF lies in the different values
that the linker writes into the ELF Header structure and Program Headers
structure of the ELF file format. These differences instruct the OS loader to
initiate a new process when it loads a program-type ELF file, and to augment
the process that it has under construction when it loads a non-program ELF file. Thus
a non-program DSO gets mapped into the process of its parent program. The fact that a program
initiates a new process requires that a program shall have single default entry point
to which the OS will pass control: that entry point is the mandatory main
function
in a C or C++ program. A non-program DSO, on the other hand, doesn't need a single mandatory entry point. It can be entered through any of the global functions it exports by function calls from the
parent program.
But from the point of view of file structure and content, a DSO and a program are very similar things. They are files that can be components of a process. A program must be the initial component. A DSO can be a secondary component.
It is still common for the further distinction to be made: A DSO must consist entirely of relocatable code (because there's no knowing at linktime where the loader may need to place it in a process address space), whereas a program consists of absolute code, always loaded at the same address. But in fact its quite possible to link a relocatable program:
$ gcc -pie -o foobarprog foobarprog.o -L. -lfoobar -Wl,--rpath=$(pwd)
That's what -pie
(Position Independent Executable) does here. And then:
$ file foobarprog
foobarprog: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), ....
file
will say that foobarprog
is a DSO, which it is, although it is
also still a program:
$ ./foobarprog
foo
bar
And PIE executables are catching on. In Debian 9 and derivative distros (Ubuntu 17.04...) the GCC toolchain builds PIE programs by default.
If you hanker for detailed knowledge of the ar
and ELF
file
formats, here are details of the ar
format
and here are details of the ELF format.
why not have a single type of library file accompanied by compiler flags which indicate how the library should be linked (static vs dynamic)?
The choice between dynamic and static linkage is already fully controllable by
commandline linkage options, so there's no need abandon either ar
archives or DSOs or to invent a another kind
of library to achieve this. If the linker couldn't use ar
archives the way it does,
that would be a considerable inconvenience. And of course if the linker couldn't link
DSOs we'd back to the operating systems stone-age.