I'm building a Mach-O 64 bit binary parser for a reverse engineering tool, ghidra style. I want the program to output where we are in readable human language, only using file format identifiers.
Let me show you an example:
0x100000004: cf fa ed fe
0x100000008: 0c 00 00 01
0x10000000c: 00 00 00 00
0x100000010: 02 00 00 00
0x100000014: 11 00 00 00
0x100000018: 20 04 00 00
0x10000001c: 85 00 20 00
0x100000020: 00 00 00 00
0x100000024: 19 00 00 00 LC_SEGMENT_64
Here, LC_SEGMENT_64 is on the side of the where it starts, i know this because the LC_SEGMENT_64 identifier is 0x19. But if i do this to every single possible Mach-O identifier it's going to get messy. How do I implement this in a good way, without using 50 thousand if-else statements?
My code atm:
#include <errno.h>
#include <mach-o/loader.h>
#include <mach/machine.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 4
#define ERROR(msg) fprintf(stderr, "ERROR: %s | %s\n", msg, strerror(errno));
void HexPrinter(uint32_t buffer, FILE *binary) {
uint64_t mem_addr = 0x10000000;
fread(&buffer, 1, BUFFER_SIZE, binary);
if (buffer != MH_MAGIC_64) {
ERROR("Not a 64-bit Mach-O binary");
return;
} else {
printf("0x%llx: %02x %02x %02x %02x\n", mem_addr, (buffer & 0xFF),
((buffer >> 9) & 0xFF), ((buffer >> 16) & 0xFF),
((buffer >> 24) & 0xFF));
}
while ((fread(&buffer, 1, BUFFER_SIZE, binary)) == BUFFER_SIZE) {
printf("0x%llx: %02x %02x %02x %02x\t", mem_addr, (buffer & 0xFF),
((buffer >> 9) & 0xFF), ((buffer >> 16) & 0xFF),
((buffer >> 24) & 0xFF));
/* I don't want to write one of these for each identifier */
if (buffer == LC_SEGMENT_64) {
printf("LC_SEGMENT_64\n");
} else {
printf("\n");
}
mem_addr += BUFFER_SIZE;
}
if (ferror(binary)) {
ERROR("Error reading file");
fclose(binary);
return;
}
}
int main(int argc, char *argv[1]) {
FILE *binary;
char *pathname = argv[1];
uint32_t buffer;
if (!argv[1]) {
ERROR("Usage: ./nibBrev <pathname>");
return (-1);
}
binary = fopen(pathname, "r");
if (!binary) {
ERROR("Couldn't open file");
return (-1);
}
HexPrinter(buffer, binary);
fclose(binary);
return 0;
}
I read the loader.h file provided from this question for the Mach-O format you are working with and will target my answer to what I read there. If this link is out of date or not correct adjust based on what you are working with.
The constants you mention are sequential starting at #define LC_SEGMENT 0x1
and ending at #define LC_BUILD_VERSION 0x32
. Create a mapping from these constants to indices in a table of strings like this.
/* This is how many identifiers I saw listed in loader.h add one
for the 0th unused slot. Number based on the last identifier in
the list of constants or count by hand. Either works. */
#define NUM_IDENTIFIERS LC_BUILD_VERSION + 1
static const char *const identifier_strings[NUM_IDENTIFIERS] = {
/* Probably unused 0th slot. */
[0] = "",
[LC_SEGMENT] = "LC_SEGMENT",
[LC_SYMTAB] = "LC_SYMTAB",
/* ...continues for rest of identifier constants */
/* careful with constants OR'd with LC_REQ_DYLD bit.
Or do this to all values to be safe, maybe? */
[LC_RPATH & ~LC_REQ_DYLD] = "LC_RPATH",
/* ...continues */
[LC_BUILD_VERSION] = "LC_BUILD_VERSION",
};
The strings are now always in sync with those constants and the string table can be reorganized in any way for readability because of the [index] = "string",
notation used.
Now a lookup function might look like this.
const char *lookup(uint32_t identifier) {
/* While the identifiers are sequential be mindful of
the LC_REQ_DYLD bit that has been OR'd to some in
the list. That would mess up the indexing. See the
comment above this constant for more info. */
identifier &= ~LC_REQ_DYLD;
if (identifier && identifier < NUM_IDENTIFIERS) {
return identifier_strings[identifier];
}
return NULL;
}
Then same as the other answer.
const char *id = lookup(buffer);
if (id) {
puts(id);
} else {
puts('\n');
}
Warning: I assume you are only interested in the LC_*
section of identifiers. Both Ted Lyngmo's answer and mine would not work if you added more constants to print from loader.h
because I see the same values used for many different #define
s throughout the file.