cmacosexecutable

How to read Mach-O header from object file?


I have spent the past few days experimenting with assembly, and now understand the relationship between assembly and machine code (using x86 via NASM on OSX, reading the Intel docs).

Now I am trying to understand the details of how the linker works, and specifically want to understand the structure of Mach-O object files, starting with the Mach-O headers.

My question is, can you map out how the Mach-O headers below map to the otool command output (which displays the headers, but they are in a different format)?

Some reasons for this question include:

Below I show the example and process I went through to try to decode the Mach-O header from a real object file. Throughout the descriptions below, I try to show hints of all the little/subtle questions that arise. Hopefully this will provide a sense of how this can be very confusing to a newcomer.


Example

Starting with a basic C file called example.c:

#include <stdio.h>

int
main() {
  printf("hello world");
  return 0;
}

Compile it with gcc example.c -o example.out, which gives:

cffa edfe 0700 0001 0300 0080 0200 0000
1000 0000 1005 0000 8500 2000 0000 0000
1900 0000 4800 0000 5f5f 5041 4745 5a45
524f 0000 0000 0000 0000 0000 0000 0000
0000 0000 0100 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 1900 0000 2802 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
0000 0000 0100 0000 0010 0000 0000 0000
0000 0000 0000 0000 0010 0000 0000 0000
0700 0000 0500 0000 0600 0000 0000 0000
5f5f 7465 7874 0000 0000 0000 0000 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
400f 0000 0100 0000 2d00 0000 0000 0000
400f 0000 0400 0000 0000 0000 0000 0000
0004 0080 0000 0000 0000 0000 0000 0000
5f5f 7374 7562 7300 0000 0000 0000 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
6e0f 0000 0100 0000 0600 0000 0000 0000
6e0f 0000 0100 0000 0000 0000 0000 0000
0804 0080 0000 0000 0600 0000 0000 0000
5f5f 7374 7562 5f68 656c 7065 7200 0000
... 531 total lines of this

Run otool -h example.out, which prints:

example.out:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x80          2    16       1296 0x00200085

Research

To understand the Mach-O file format, I found these resources helpful:

Those last 3 from github.com/apple-oss-distributions contain all the constants, such as these:

#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
...
#define CPU_TYPE_MC680x0  ((cpu_type_t) 6)
#define CPU_TYPE_X86    ((cpu_type_t) 7)
#define CPU_TYPE_I386   CPU_TYPE_X86    /* compatibility */
#define CPU_TYPE_X86_64   (CPU_TYPE_X86 | CPU_ARCH_ABI64)

The structure of the Mach-O header is shown as:

struct mach_header_64 {
  uint32_t  magic;    /* mach magic number identifier */
  cpu_type_t  cputype;  /* cpu specifier */
  cpu_subtype_t cpusubtype; /* machine specifier */
  uint32_t  filetype; /* type of file */
  uint32_t  ncmds;    /* number of load commands */
  uint32_t  sizeofcmds; /* the size of all the load commands */
  uint32_t  flags;    /* flags */
  uint32_t  reserved; /* reserved */
};

Given this information, the goal was to find each of those pieces of the Mach-O header in the example.out object file.


First: Finding the "magic" number

Given that example and research, I was able to identify the first part of the Mach-O header, the "magic number". That was cool.

But it wasn't a straightforward process. Here are the pieces of information that had to be collected to figure that out.

Here are the 3 numbers, which were enough to sort of figure out what the magic number is:

0xcffaedfe // value from MH_CIGAM_64
0xfeedfacf // value from otool
cffa edfe  // value in example.out

So that's exciting! Still not totally sure if I am coming to the right conclusion about these numbers, but hope so.


Next: Finding the cputype

Now it starts to get confusing. Here are the pieces that needed to be put together to almost make sense of it, but this is where I'm stuck so far:

Here are the relevant constants to do calculate the value of CPU_TYPE_X86_64:

#define CPU_ARCH_ABI64  0x01000000      /* 64 bit ABI */
#define CPU_TYPE_X86        ((cpu_type_t) 7)
#define CPU_TYPE_I386       CPU_TYPE_X86        /* compatibility */
#define CPU_TYPE_X86_64     (CPU_TYPE_X86 | CPU_ARCH_ABI64)

So basically:

CPU_TYPE_X86_64 = 7 BITWISEOR 0x01000000 // 16777223

That number 16777223 matches what is shown by otool, nice!

Next, tried to find that number in the example.out, but it doesn't exist because that is a decimal number. I just converted this to hex in JavaScript, where

> (16777223).toString(16)
'1000007'

So not sure if this is the correct way to generate a hex number, especially one that will match the hex numbers in a Mach-O object file. 1000007 is only 7 numbers too, so don't know if you are supposed to "pad" it or something.

Anyways, you see this number example.out, right after the magic number:

0700 0001

Hmm, they seem somewhat related:

0700 0001
1000007

It looks like there was a 0 added to the end of 1000007, and that it was reversed.


Question

At this point I wanted to ask the question, already spent a few hours to get to this point. How does the structure of the Mach-O header map to the actual Mach-O object file? Can you show how each part of the header shows up in the example.out file above, with a brief explanation why?


Solution

  • Part of what's confusing you is endianness. In this case, the header is stored in the native format for the platform. Intel-compatible platforms are little-endian systems, meaning the least-significant byte of a multi-byte value is first in the byte sequence.

    So, the byte sequence 07 00 00 01, when interpreted as a little-endian 32-bit value, corresponds to 0x01000007.

    The other thing you need to know to interpret the structure is the size of each field. All of the uint32_t fields are pretty straightforward. They are 32-bit unsigned integers.

    Both cpu_type_t and cpu_subtype_t are defined in machine.h that you linked to be equivalent to integer_t. integer_t is defined to be equivalent to int in /usr/include/mach/i386/vm_types.h. OS X is an LP64 platform, which means that longs and pointers are sensitive to the architecture (32- vs. 64-bit), but int is not. It's always 32-bit.

    So, all of the fields are 32 bits or 4 bytes in size. Since there are 8 fields, that's a total of 32 bytes.

    From your original hexdump, here's the part which corresponds to the header:

    cffa edfe 0700 0001 0300 0080 0200 0000
    1000 0000 1005 0000 8500 2000 0000 0000
    

    Broken out by field:

    struct mach_header_64 {
      uint32_t  magic;           cf fa ed fe -> 0xfeedfacf
      cpu_type_t  cputype;       07 00 00 01 -> 0x01000007
      cpu_subtype_t cpusubtype;  03 00 00 80 -> 0x80000003
      uint32_t  filetype;        02 00 00 00 -> 0x00000002
      uint32_t  ncmds;           10 00 00 00 -> 0x00000010
      uint32_t  sizeofcmds;      10 05 00 00 -> 0x00000510
      uint32_t  flags;           85 00 20 00 -> 0x00200085
      uint32_t  reserved;        00 00 00 00 -> 0x00000000
    };