elf

File offset and virtual address in ELF


From ELF documentation:

Loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size.

What is the motivation for this rule? Following it generates gaps in ELF files, wasting disk space.


Solution

  • A modern OS doesn't load executables by simply reading the contents of each entire segment into memory. Rather, it uses demand paging: for each segment to be loaded, the relevant portion of the file is mapped into memory using mmap(2) or equivalent functionality. That way, the system only needs to load pages that are actually used; and if a page has not been used for a while, it can be dropped from memory and reloaded from the file as needed.

    Memory mapping of files happens only in units of pages, and there is a restriction that the portion of the file to be mapped must also be at an offset which is a multiple of the page size. This fits well with the block-based I/O model of filesystems and disk hardware: filesystem and disk blocks are typically a power-of-two size that's less than or equal to the page size, and either the OS or DMA hardware prefers to read disk blocks at aligned addresses (see Memory alignment). Moreover, the data of a given file usually occupies dedicated blocks (rather than being mixed with metadata, or data of another file).

    So this ELF restriction ensures that segments can be loaded by mapping a region of the file into memory, with all alignment restrictions satisfied. And this in turn ensures that when individual pages are actually loaded into memory, it can be done by reading entire blocks directly from the disk, without having to recopy the data to fix up alignment.

    The extra disk usage is unlikely to be significant. Looking at /usr/bin on a convenient Ubuntu system, it contains about 2500 executables totaling 1.1 GB. If we assume a typical executable has 4 loadable segments, and each wastes an average of 2 KB (half a page) due to this rule, then that's about 20 MB wasted, or about 2% of the total. As a percentage of total system disk usage, it's even smaller. If we estimate that an SSD costs about $100 USD per terabyte, then the cost of the wasted space is $0.002, i.e. two-tenths of one cent.