cmach-ofatuniversal-binaryfat-binaries

Parsing universal/fat binary files


I'm working on a project to implement a basic nm using memory-mapping mmap. I have been able to parse 64-bit binaries using the code:

void        handle_64(char *ptr)
{
    int                     ncmds;
    struct mach_header_64   *header;
    struct load_command     *lc;
    struct symtab_command   *sym;
    int                     i;

    i = 0;
    header = (struct mach_header_64 *)ptr;
    ncmds = header->ncmds;
    lc = (void *)ptr + sizeof(*header);
    while (i < ncmds)
    {
        if (lc->cmd == LC_SYMTAB)
        {
            sym = (struct symtab_command *)lc;
            build_list (sym->nsyms, sym->symoff, sym->stroff, ptr);
            break;
         }
         lc = (void *) lc + lc->cmdsize;
         i++;
    }
}

According to this link the only difference between a mach-o and a fat binary is the fat_header struct above it, but simply skipping over with

lc = (void *)ptr + sizeof(struct fat_header) + sizeof(struct mach_header_64);

doesn't get me to the load_command area (segfault). How do I access the load commands of a fat/universal binary.

I'm working on a 64-bit Mac running macOS High Sierra. Thank you.


Solution

  • You've got multiple problems:

    Considering all of that, you need to parse the fat header (and not just ignore it) if you want any hope of getting useful results.

    Now, fat_header is defined as follows:

    struct fat_header {
        uint32_t    magic;      /* FAT_MAGIC or FAT_MAGIC_64 */
        uint32_t    nfat_arch;  /* number of structs that follow */
    };
    

    Firstly, the magic value that I usually see for fat binaries is FAT_CIGAM rather than FAT_MAGIC, despite the comment stating otherwise (take care though - this means that integers in the fat header are big endian rather than little endian!). But secondly, it is indicated that certain structs follow this header, namely:

    struct fat_arch {
        cpu_type_t  cputype;    /* cpu specifier (int) */
        cpu_subtype_t   cpusubtype; /* machine specifier (int) */
        uint32_t    offset;     /* file offset to this object file */
        uint32_t    size;       /* size of this object file */
        uint32_t    align;      /* alignment as a power of 2 */
    };
    

    This works the same way a "thin" Mach-O header does with its load commands. fat_arch.offset is the offset from the very beginning of the file. Following that, it's quite simple to print all slices of a fat Mach-O:

    #include <stdio.h>
    #include <mach-o/fat.h>
    
    #define SWAP32(x) ((((x) & 0xff000000) >> 24) | (((x) & 0xff0000) >> 8) | (((x) & 0xff00) << 8) | (((x) & 0xff) << 24))
    
    void print_fat_header(void *buf)
    {
        struct fat_header *hdr = buf;
        if(hdr->magic != FAT_CIGAM)
        {
            fprintf(stderr, "bad magic: %08x\n", hdr->magic);
            return;
        }
        struct fat_arch *archs = (struct fat_arch*)(hdr + 1);
        uint32_t num = SWAP32(hdr->nfat_arch);
        for(size_t i = 0; i < num; ++i)
        {
            const char *name = "unknown";
            switch(SWAP32(archs[i].cputype))
            {
                case CPU_TYPE_I386:     name = "i386";      break;
                case CPU_TYPE_X86_64:   name = "x86_64";    break;
                case CPU_TYPE_ARM:      name = "arm";       break;
                case CPU_TYPE_ARM64:    name = "arm64";     break;
            }
            uint32_t off = SWAP32(archs[i].offset);
            uint32_t magic = *(uint32_t*)((uintptr_t)buf + off);
            printf("%08x-%08x: %-8s (magic %8x)\n", off, off + SWAP32(archs[i].size), name, magic);
        }
    }
    

    Note that the above function is incomplete, as it does not know the length of buf and thus cannot and does not check any accessed memory against it. In a serious implementation, you should make sure to never read outside the buffer you're given. The fact that your code segfaulted also hints at it not doing enough data sanitisation.