I found this recently: https://github.com/xoreaxeaxeax/movfuscator
It seems to be contingent on the fact that mov
is turing-complete. Is that true, and why?
Yes, x86's mov
is Turing complete. I added that tag to your question because it may not be true for other ISAs with an instruction called mov
, and the movfuscator compiler only targets x86.
It's not "mov" itself doing computation, it's x86 addressing modes which can do addition (and bit-shift). I haven't looked in detail at how it works, but it relies a lot on lookup tables, and stuff like mov eax, [base + eax*4]
loading one of two possible values depending on EAX being 0 or 1.
Also remember that x86 mov
has several forms: between memory and register (load, store, or reg<-reg), or immediate to memory or register. And with addressing modes including absolute and register indirect.
I don't think it relies on self-modifying code, but correct me if I'm wrong. (And if it does, I think/hope movfuscator at least won't create instructions other than mov
. That would be cheating.)
Also, it's only sort of true; you need some way to loop the main program, assuming the original source isn't straight-line with no loops - the Movfuscator github readme talks about this:
While Dolan's paper required a jmp instruction, the M/o/Vfuscator does not - it uses a faulting mov instruction to achieve the infinite execution loop. If you're worried that this is still "jumping", the same effect could be achieved through pages aliased to the same address, wrapping execution around the upper range of memory, ring 0 exception handling, or simply repeating the mov loop indefinitely. A jmp is currently used to dispatch external functions - if this is a problem, avoid using external functions, or compile libraries with the M/o/Vfuscator as well.
When creating a user-space environment for mov-only code to run, I assume it creates a SIGSEGV handler (on POSIX OSes) that restarts execution from the top. So any faulting load can restart the main loop.
Letting execution wrap around is also mentioned as a possibility:
Wraparound of IP can work well in 16-bit mode where IP is 16-bit but CS:IP form a 20-bit linear address (real mode) or in 16-bit protected mode a 64k window somewhere in linear address space. i.e. you can have a 64k block of instructions in only part of your address space, with other space left for data. The DS segment can use a different base. (32-bit addressing modes in 16-bit mode are possible so you still have the full power of any register and scaled-index addressing mode.) Note that the mnemonic for reading and writing segment registers is also mov
.
But harder in 32-bit mode where EIP is 32-bit and so are linear addresses after seg+off calculation. Unless there's some other trick, wraparound can only happen across the entire address space, regardless of what you do with segmentation. This leaves no non-code space for data. Setting a lower segment limit could make code-fetch fault, but that doesn't cause wraparound (unless you set a signal handler, or on bare metal set an interrupt handler address).
And either way, even x86-64 only has 64-bit pointers (in theory; 48 or 57-bit in practice), so space is finite, unlike a real Turing machine with infinite tape.