I am trying to understand what the purpose of the LC_SEGMENT_SPLIT_INFO
load command in the Mach-O executable format is. I understand that it does have something to do with rebase/relocations which are specific to the dyld shared cache, but I'm missing the big picture.
The load command itself is simply a struct linkedit_data_command
pointing into the __LINKEDIT
segment; what I'm really asking, is what is the purpose of its __LINKEDIT
payload, why does it exist, what is it used for.
Some information about it I have gathered:
MachOFile::canBePlacedInDyldCache
states that "dylib must have extra info for moving DATA and TEXT segments apart", and the following code makes clear that the LC_SEGMENT_SPLIT_INFO
is this "extra info"DYLD_CACHE_ADJ_V2_FORMAT
byte (0x7F
).AdjustDylibSegments.cpp
as: // Whole :== <count> FromToSection+
// FromToSection :== <from-sect-index> <to-sect-index> <count> ToOffset+
// ToOffset :== <to-sect-offset-delta> <count> FromOffset+
// FromOffset :== <kind> <count> <from-sect-offset-delta>
#define DYLD_CACHE_ADJ_V2_POINTER_32 0x01
#define DYLD_CACHE_ADJ_V2_POINTER_64 0x02
#define DYLD_CACHE_ADJ_V2_DELTA_32 0x03
#define DYLD_CACHE_ADJ_V2_DELTA_64 0x04
#define DYLD_CACHE_ADJ_V2_ARM64_ADRP 0x05
#define DYLD_CACHE_ADJ_V2_ARM64_OFF12 0x06
#define DYLD_CACHE_ADJ_V2_ARM64_BR26 0x07
#define DYLD_CACHE_ADJ_V2_ARM_MOVW_MOVT 0x08
#define DYLD_CACHE_ADJ_V2_ARM_BR24 0x09
#define DYLD_CACHE_ADJ_V2_THUMB_MOVW_MOVT 0x0A
#define DYLD_CACHE_ADJ_V2_THUMB_BR22 0x0B
#define DYLD_CACHE_ADJ_V2_IMAGE_OFF_32 0x0C
#define DYLD_CACHE_ADJ_V2_THREADED_POINTER_64 0x0D
3
is now "used for arm64 ADRP", it appears its original purpose was PPC hi16 encoding(You may wonder why I am asking this. It has simply piqued my curiosity. Sometimes, when I see something I don't understand, I feel the urge to understand it.t)
The comment you found in dyld source already says it:
dylib must have extra info for moving DATA and TEXT segments apart
When creating a dyld shared cache, the TEXT segment of all dylibs are extracted and merged into one big executable segment. The same happens with all DATA_CONST and DATA segments, respectively.
The problem is that any non-trivial dylib will contain instructions that generate the addresses of things in DATA, and there will be pointers in DATA that point to specific code in TEXT. The latter is already encoded in rebasing information required for ASLR, but the former is not. In order to put a dylib into the shared cache, instructions in its TEXT segment that generate addresses from the DATA segment need to be changed, otherwise they will refer to the wrong addresses in the cache.