I want to build a shared library on Linux which contains a big initialized array and use this array in different executables. I'd expect this to allow to reduce the compilation output size epecially if several programs use this data. Unfortunately, this seems to not be the case when the data in the shared object is marked as read-only.
Here is my "tab" symbol of 4Mib inside the object:
$ nm --print-size bigfile.o
0000000000000000 0000000000400000 R tab
I use ld to create a shared object:
ld -o libbigfile.so -shared bigfile.o
And this result in a 4M executable when linked with
gcc -o bigfile main.o libbigfile.so
And the responsible for this seems to be the .data.rel.ro
$ readelf --section-headers bigfile
[21] .data.rel.ro PROGBITS 0000000000403dc0 00002dc0
0000000000400000 0000000000000000 WA 0 0 64
But as i can inspect with readelf -x .data.rel.ro bigfile
the .data.rel.ro is full of 0x00.
So if the content of the .rodata section of a shared object if only copied at load time, why does it takes that space in the executable binary instead of being allocated at load time as .bss does ?
I have a very simple main:
#include <stdio.h>
extern char tab[];
int main() {
puts(tab);
return 0;
}
I produce my shared object from a C or assembly file but the assembly file is smaller (no "times" prefix in C sadly) so here is the assembly version:
global tab:data BYTESIZE
BYTESIZE equ (1 << 22)
section .rodata
align 64
tab:
times (BYTESIZE - 2) db 'A'
db 0xA
db 0x0
And to build it:
nasm -f elf64 bigfile.asm
ld -o libbigfile.so -shared bigfile.o
gcc -c main.c
gcc -o bigfile main.o libbigfile.so
Note: If i put the tab symbol in the .data section the size problem disappear.
why does it takes that space in the executable binary instead of being allocated at load time as .bss does ?
It's a bug (or rather a deficiency) in the GNU ld -- it didn't have to make a copy.
The reason the linker has to make a copy of the data (symbol) in the main executable is explained here -- copy relocations.
But there is no reason for the linker to make a copy of the symbol contents, and indeed LLD does not suffer from the same deficiency.
GNU-ld doesn't suffer from this either when the data is writable (as you noted), but does when the data is read-only.
// tab.c
const char tab[0x400000] = {'a'};
// main.c
#include <stdio.h>
extern const char tab[];
int main() { puts(tab); return 0; }
gcc -fPIC -shared -o tab.so tab.c && gcc main.c ./tab.so
readelf -Ws a.out | grep ' tab$'
4: 0000000000403de0 0x400000 OBJECT GLOBAL DEFAULT 21 tab
32: 0000000000403de0 0x400000 OBJECT GLOBAL DEFAULT 21 tab
readelf -WS a.out | grep '\[21\]'
[21] .data.rel.ro PROGBITS 0000000000403760 002760 12d687 00 WA 0 0 32
As you can see, GNU-ld puts a copy of symbol contents into .data.rel.ro
.
Now let's try using a different linker:
gcc main.c ./tab.so -fuse-ld=lld
ls -l a.out
rwxr-xr-x 1 user 6248 Dec 3 20:48 a.out
readelf -Ws a.out | grep ' tab$'
6: 00000000002028a0 0x12d687 OBJECT GLOBAL DEFAULT 22 tab
28: 00000000002028a0 0x12d687 OBJECT GLOBAL DEFAULT 22 tab
readelf -WS a.out | grep '\[22\]'
[22] .bss.rel.ro NOBITS 00000000002028a0 0008a0 400000 00 WA 0 0 32
LLD does not make the unnecessary copy, resulting in a much smaller executable.
Gold does the same as GNU-ld.
Note that even though LLD doesn't make a copy in the executable, a copy will be made at runtime (at the binary startup time), so this is not ideal.
A possible solution is to build all your binaries with -fPIC
, but that is not ideal either, because -fPIC
code is slower and larger.
In your question you state that you want to reduce compilation output size since several programs use the same tab[]
array.
If your goal is to reduce the total size occupied by N different programs and all these programs are shipped together, the best solution might be to avoid the shared library, and instead link all of these programs into a single binary, BusyBox style.