dyldmach-o

What is the purpose of LC_SEGMENT_SPLIT_INFO?


I am trying to understand what the purpose of the LC_SEGMENT_SPLIT_INFO load command in the Mach-O executable format is. I understand that it does have something to do with rebase/relocations which are specific to the dyld shared cache, but I'm missing the big picture.

The load command itself is simply a struct linkedit_data_command pointing into the __LINKEDIT segment; what I'm really asking, is what is the purpose of its __LINKEDIT payload, why does it exist, what is it used for.

Some information about it I have gathered:

    // Whole         :== <count> FromToSection+
    // FromToSection :== <from-sect-index> <to-sect-index> <count> ToOffset+
    // ToOffset         :== <to-sect-offset-delta> <count> FromOffset+
    // FromOffset     :== <kind> <count> <from-sect-offset-delta>
#define DYLD_CACHE_ADJ_V2_POINTER_32            0x01
#define DYLD_CACHE_ADJ_V2_POINTER_64            0x02
#define DYLD_CACHE_ADJ_V2_DELTA_32              0x03
#define DYLD_CACHE_ADJ_V2_DELTA_64              0x04
#define DYLD_CACHE_ADJ_V2_ARM64_ADRP            0x05
#define DYLD_CACHE_ADJ_V2_ARM64_OFF12           0x06
#define DYLD_CACHE_ADJ_V2_ARM64_BR26            0x07
#define DYLD_CACHE_ADJ_V2_ARM_MOVW_MOVT         0x08
#define DYLD_CACHE_ADJ_V2_ARM_BR24              0x09
#define DYLD_CACHE_ADJ_V2_THUMB_MOVW_MOVT       0x0A
#define DYLD_CACHE_ADJ_V2_THUMB_BR22            0x0B
#define DYLD_CACHE_ADJ_V2_IMAGE_OFF_32          0x0C
#define DYLD_CACHE_ADJ_V2_THREADED_POINTER_64   0x0D

(You may wonder why I am asking this. It has simply piqued my curiosity. Sometimes, when I see something I don't understand, I feel the urge to understand it.t)


Solution

  • The comment you found in dyld source already says it:

    dylib must have extra info for moving DATA and TEXT segments apart

    When creating a dyld shared cache, the TEXT segment of all dylibs are extracted and merged into one big executable segment. The same happens with all DATA_CONST and DATA segments, respectively.

    The problem is that any non-trivial dylib will contain instructions that generate the addresses of things in DATA, and there will be pointers in DATA that point to specific code in TEXT. The latter is already encoded in rebasing information required for ASLR, but the former is not. In order to put a dylib into the shared cache, instructions in its TEXT segment that generate addresses from the DATA segment need to be changed, otherwise they will refer to the wrong addresses in the cache.