Consider the following two files on a Linux system:
use_message.cpp
#include <iostream>
extern const char* message;
void print_message();
int main() {
std::cout << message << '\n';
print_message();
}
libmessage.cpp
#include <iostream>
const char* message = "Meow!"; // 1. absolute address of string literal
// needs runtime relocation in a .so
void print_message() {
std::cout << message << '\n';
}
We can compile use_message.cpp into an object file, compile libmessage.cpp into a shared library, and link them together, like so:
$ g++ use_message.cpp -c -pie -o use_message.o
$ g++ libmessage.cpp -fPIC -shared -o libmessage.so
$ g++ use_message.o libmessage.so -o use_message
The definition for message
originally lives in libmessage.so. When use_message
is executed, the dynamic linker performs relocations that:
message
definition inside libmessage.so with the load address of the string datamessage
from libmessage.so into use_message's .bss
sectionmessage
inside use_messageThe relevant relocations, as dumped by readelf
, are:
use_message
Offset Info Type Sym. Value Sym. Name + Addend
000000004150 000c00000005 R_X86_64_COPY 0000000000004150 message + 0
This is relocation number 2 in list I wrote before.
libmessage.so
Offset Info Type Sym. Value Sym. Name + Addend
000000004040 000000000008 R_X86_64_RELATIVE 2000
000000003fd8 000b00000006 R_X86_64_GLOB_DAT 0000000000004040 message + 0
These are relocation numbers 1 and 3, respectively.
There's a dependency between relocation numbers 1 and 2: the update to libmessage.so's message
definition must happen before this value is copied into use_message, otherwise use_message will not point to the correct location.
My question is: how is the order for applying relocations specified? Is there something encoded in the ELF files that specifies this? Or in the ABI? Or is the dynamic linker just expected to work out the dependencies between relocations itself and ensure that any relocations that write to a given memory address are run before any relocations that read from the same location? Does the static linker only output relocations such that the ones in the executable can always be processed after the shared library ones?
My question is: how is the order for applying relocations specified? Is there something encoded in the ELF files that specifies this? Or in the ABI? Or is the dynamic linker just expected to work out the dependencies between relocations itself and ensure that any relocations that write to a given memory address are run before any relocations that read from the same location? Does the static linker only output relocations such that the ones in the executable can always be processed after the shared library ones?
I think the relocation resolving order is not specified by a standard. Dynamic loaders define an order. To support copy relocations, the main executable is relocated the last. Linkers only produce copy relocations for executable links (-no-pie/-pie) and are aware of the dynamic loader semantics.
Quoting https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order:
There are two parts: the order within a module and the order between two modules.
glibc rtld processes relocations in the reverse search order (reversed l_initfini) with a special case for the rtld itself. The main executable needs to be processed the last to process R_*_COPY. If A has an ifunc referencing B, generally B needs to be relocated before A. Without ifunc, the resolving order of shared objects can be arbitrary.
Let's say we have the following dependency tree.
main
dep1.so
dep2.so
dep3.so
libc.so.6
dep4.so
dep3.so
libc.so.6
libc.so.6
libc.so.6
l_initfini contains main, dep1.so, dep2.so, dep4.so, dep3.so, libc.so.6, ld.so. The relocation resolving order is ld.so (bootstrap), libc.so.6, dep3.so, dep4.so, dep2.so, dep1.so, main, ld.so.
Within a module, glibc rtld resolves relocations in order. Assume that both DT_RELA (.rela.dyn) and DT_PLTREL (.rela.plt) are present, glibc logic is like the following:
// Simplified from elf/dynamic-link.h
ranges[0] = {DT_RELA, DT_RELASZ, 0};
ranges[1] = {DT_JMPREL, DT_PLTRELSZ, do_lazy};
if (!do_lazy && ranges[0].start + ranges[0].size == ranges[1].start) { // the equality operator is always satisfied in practice
ranges[0].size += size;
ranges[1] = {};
}
for (int ranges_index = 0; ranges_index < 2; ++ranges_index)
elf_dynamic_do_Rela (... ranges[ranges_index]);
musl ldso/dynlink.c
has:
/* The main program must be relocated LAST since it may contain
* copy relocations which depend on libraries' relocations. */
reloc_all(app.next);
reloc_all(&app);
FreeBSD rtld uses a more sophisticated order, which make certain ifunc code more robust.
$ g++ use_message.cpp -c -pie -o use_message.o
$ g++ libmessage.cpp -fPIC -shared -o libmessage.so
$ g++ use_message.o libmessage.so -o use_message
BTW, use_message
(with -fPIE relocatable files) needs copy relocations because of GCC HAVE_LD_PIE_COPYRELOC
.
For Clang and GCC's other architectures, the PIE modes will not lead to copy relocations.