caching memory linker elf static-linking

Does Disaggregating Sections in an ELF File Impact Performance?

I am investigating whether disaggregating sections in a statically linked ELF file affects performance. Specifically, I have multiple statically linked libraries (e.g., lib1, lib2, lib3, lib4), and instead of the conventional layout, I separate their sections by type.

In the original layout, as dictated by the linker script, all sections are aggregated per type. However, I modify the linker script to create separate sections per library, as follows:

.text.lib1   : { lib1.o(.text); }
.rodata.lib1 : { lib1.o(.rodata); }
.data.lib1   : { lib1.o(.data); }
.bss.lib1    : { lib1.o(.bss); }
...
.text.lib4   : { lib4.o(.text); }
.rodata.lib4 : { lib4.o(.rodata); }
.data.lib4   : { lib4.o(.data); }
.bss.lib4    : { lib4.o(.bss); }

As a result, the ELF file contains the following sections/segments:

Section Headers:
  [ 1] .text.lib1        PROGBITS        0000000000400000 001000 0010dc 00  AX  0   0  1
  [ 2] .rodata.lib1      PROGBITS        00000000004010e0 0020e0 00011d 00   A  0   0 32
  [ 3] .data.lib1        PROGBITS        0000000000401200 002200 000021 00  WA  0   0 16
  [ 4] .bss.lib1         NOBITS          0000000000401240 002221 000052 00  WA  0   0 32
  [ 5] .text.lib2        PROGBITS        0000000000402000 003000 0041a1 00  AX  0   0  1
  [ 6] .rodata.lib2      PROGBITS        00000000004061c0 0071c0 000378 00   A  0   0 32
  ...
Segment Sections...
   00
   01     .text.lib1 .rodata.lib1 .data.lib1 .bss.lib1 (RWE)
   02     .text.lib2 .rodata.lib2 .data.lib2 .bss.lib2 (RWE)
   03     .text.lib3 .rodata.lib3 .data.lib3 .bss.lib3 (RWE)
   04     .text.lib4 .rodata.lib4 .data.lib4 .bss.lib4 (RWE)
   05     .text.main .rodata.main .bss.main (RWE)

My main questions are:

How does this disaggregated layout impact performance, particularly in terms of paging, cache locality, and memory access patterns?
What are the known trade-offs between keeping sections grouped per library versus grouping them by type?

Any insights or references to relevant documentation would be greatly appreciated!

Solution

As a result, the ELF file contains the following sections/segments:

01 .text.lib1 .rodata.lib1 .data.lib1 .bss.lib1 (RWE)

Here you've made .text and .rodata writeable. That is quite bad from security perspective -- it makes it much easier for an attacker to overwrite your instructions or read-only data.

Some hardening systems prohibit writeable mappings to also be executable; on such a system your binary will refuse to run at all.

How does this disaggregated layout impact performance, particularly in terms of paging, cache locality, and memory access patterns?

If most of the accesses to function and data are within lib1 (that is, main calls function_in_lib1() and then ~everything happens within lib1, with no calls to anything in lib2 or lib3, then your layout may be beneficial in terms of cache.

But that is quite rare: usually libraries are built in layers, and lib1 is likely to call into lib2, which in turn is likely to call lib3. And if that is true, then separating lib1 code from lib2 etc. is likely to increase TLB and cache misses.

In any case, separating "by library" is likely to be far less optimal than other forms of link-time optimization, which both Clang and GCC support.