c++creverse-engineeringdisassemblycapstone

Capstone cs_disasm disassembly only small portion of code


I'm experimenting with http://www.capstone-engine.org on MacOS and MacOS x86_64 binary. It more or less does work, however i do have 2 concerns.

I'm loading test dylib

[self custom_logging:[NSString stringWithFormat:@"Module Path:%@",clientPath]];
NSMutableData *ModuleNSDATA = [NSMutableData dataWithContentsOfFile:clientPath];
[self custom_logging:[NSString stringWithFormat:@"Client Module Size: %lu MB",(ModuleNSDATA.length/1024/1024)]];
[ModuleNSDATA replaceBytesInRange:NSMakeRange(0, 20752) withBytes:NULL length:0];
uint8_t *bytes = (uint8_t*)[ModuleNSDATA bytes];
long size = [ModuleNSDATA length]/sizeof(uint8_t);
[self custom_logging:[NSString stringWithFormat:@"UInt8_t array size: %lu",size]];
ModuleASM = [NSString stringWithCString:disassembly(bytes,size,0x5110).c_str() encoding:[NSString defaultCStringEncoding]];
  1. As far i did my research it seems i need to trim "first" bytes from binary code to remove header and metadata until it encounters real instructions. However i'm not really sure if capstone do provide any api for this or that i need to scan by byte patterns and locate first instruction address.

In fact i've applied simple workaround, i did found safe address which for sure will have instructions on most modules i will load, however i would like to apply proper solution.

  1. I've successfully loaded and disassembled part of module code using workaround i've described. However, sadly, cs_disasm returns mostly no more than 5000-6000 instructions, which is confusing, it seems it breaks on regular instructions which it shouldn't broke on to. I'm not really sure what i'm doing wrong. Module is more than 15mb of code, so there is a lot more than 5k instructions to disassembly.

Below is function i've based on Capstone Docs example

string disassembly(uint8_t *bytearray, long size, uint64_t startAddress){
    csh handle;
    cs_insn *insn;
    size_t count;
    string output;
    if (cs_open(CS_ARCH_X86, CS_MODE_64, &handle) == CS_ERR_OK){
    count = cs_disasm(handle, bytearray, size, startAddress, 0, &insn);
           printf("\nCOUNT:%lu",count);
        if (count > 0) {
            size_t j;
            for (j = 0; j < count; j++) {
                char buffer[512];
                int i=0;
                i = sprintf(buffer, "0x%" PRIx64":\t%s\t\t%s\n", insn[j].address, insn[j].mnemonic,insn[j].op_str);
                output += buffer;
            }
            cs_free(insn, count);
        } else {
            output = "ERROR: Failed to disassemble given code!\n";
        }
    }
    cs_close(&handle);
    return output;
}

I will really appreciate any help on this.
Warmly,
David


Solution

  • Anwser is to simply use SKIPDATA mode. Capstone is great, but their docs are very bad.

    Working example below. This mode is still very bugged, so preferably this detection of data sectors should be custom code. For me it works fine only with small chunks of code. However, indeed it does disassembly up to end of file.

    string disassembly(uint8_t *bytearray, long size, uint64_t startAddress){
        csh handle;
        cs_insn *insn;
        size_t count;
        string output;
        cs_opt_skipdata skipdata = {
           .mnemonic = "db",
        };
        if (cs_open(CS_ARCH_X86, CS_MODE_64, &handle) == CS_ERR_OK){
            cs_option(handle, CS_OPT_DETAIL, CS_OPT_ON);
            cs_option(handle, CS_OPT_SKIPDATA, CS_OPT_ON);
            cs_option(handle, CS_OPT_SKIPDATA_SETUP, (size_t)&skipdata);
            count = cs_disasm(handle, bytearray, size, startAddress, 0, &insn);
            if (count > 0) {
                size_t j;
                for (j = 0; j < count; j++) {
                    char buffer[512];
                    int i=0;
                    i = sprintf(buffer, "0x%" PRIx64":\t%s\t\t%s\n", insn[j].address, insn[j].mnemonic,insn[j].op_str);
                    output += buffer;
                }
                cs_free(insn, count);
            } else {
                output = "ERROR: Failed to disassemble given code!\n";
            }
        }
        cs_close(&handle);
        return output;
    }
    

    Shame to those trolls who down-voted this question.