x86-64executablecpu-architecturebinaryfilesgem5

How can I edit a binary file to change the machine code of the program?


I am trying to add a custom instruction to the x86_64 (amd64) ISA. I want to create a custom binary file for this. I want to add a custom instruction in the file which I will run on the gem5 simulator. I just want to test out this approach first, and hence I don't want to fiddle around with the compiler to generate such binary files. My intention is to first test out if the custom instruction is working as expected and to then try to modify LLVM to emit such instructions for the test code that I would be compiling.

So for now, I just want to insert some binary data in an executable file without modifying the compiler. My plan is to figure out the parts of X86_64 ISA which are not being used and then assign those parts to my custom instruction. Is there a tool available which would let me do this? Ideally, I don't want to edit the file in a general hex editor as it would be helpful to know after what instruction I have added my custom instruction. So I need something which could tell me what parts of the binary correspond to what instructions so that it is easier for me to find my way around the file.


Solution

  • Editing a binary is the hardest way to achieve what you want.

    Instead, use an assembler to emit the bytes you need. All the mainstream assemblers support emitting arbitrary bytes (e.g. db, .byte) and you can even define macros to make it looks like they support your new mnemonics.
    Furthermore, you can fully control which instructions are before and after your custom ones and you can put them right at the entry-point.

    If you would like to use a higher programming language for printing and similar, then you could use some inline assembly.
    If your language doesn't support inline assembly then you should consider rewriting the tests in one that does (the testing code is usually short and simple, making it easy to port it).

    Editing a binary is problematic unless you leave space for your new instructions. Shifting a stream of instructions down is not so easy. All the relative references may need to be computed again and the tool needs to know how to parse and edit all the metadata (e.g. relocations) to update them too.
    I'm not aware of a tool that can do that.

    You have to use a disassembler to find where to put the new instructions, then convert the address to an offset in the file (there are tools for this or it can be easily done in mind) and then edit the file with a hex editor (knowing that you are overwriting the subsequent instructions).
    You can create room for your custom instructions with a nop-sled (if you can generate it) or with useless statements (which instructions will be overwritten and padded with nops eventually).