debugginglow-level

on the fly instruction patching


I have create a small debugger for Linux. I have tried to create a mechanism which encrypts binary instructions and decrypts them just before execution (I have set hardware breakpoints or step-by-step running). It works 99 times out of 100.

I think the problem is due to L1 cache. When I try to decrypt an instruction, this instruction is already in CPU L1 cache. I have tried on ARM64 and x86_64. I got the same results.

My question is how debuggers like gdb or lldb can patch breakpoints instruction without L1 cache side effects ?

Thanks


Solution

  • The x86/x86_64 is an unusual beast, in that if you write to instruction memory, it will invalidate that line of the caches so you don't need to do anything. This is for backward compatibility to chips from the 1980's when there wasn't any caching. This means that for that processor, the L1 cache certainly isn't the problem, unless you are modifying an instruction using a different linear address as described below:

    https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf

    11.6 SELF-MODIFYING CODE A write to a memory location in a code segment that is currently cached in the processor causes the associated cache line (or lines) to be invalidated.

    The exception is if you are doing something weird with the address mapping (from the same section of the document):

    Systems software, such as a debugger, that might possibly modify an instruction using a different linear address than that used to fetch the instruction, will execute a serializing operation, such as a CPUID instruction, before the modified instruction is executed, which will automatically resynchronize the instruction cache and prefetch queue.

    So on the x86_64, you could try a CPUID instruction.

    Except for this case, your problem won't be the caches at all - I'd go looking elsewhere, such as looking for race conditions (you haven't said what the non-working case looks like).

    For other modern processors (including ARM), you will need to invalidate the relevant part of the instruction cache yourself.

    https://developer.arm.com/documentation/ddi0344/k/Babhejba