pythonassemblyx86x86-emulation

How can I properly emulate x86 with Unicorn in Python?


Background / Explanation of What I'm Trying to Accomplish

I'm currently working on a little malware analysis project and am trying to implement a string decryptor that I wrote using Unicorn. In order to condense things and make the code easier to review, I made a smaller example below from my larger codebase.

What I'm doing is extracting snippets of x86 that represent small string decryption routines. There are a series of mov instructions that are eventually xor'd resulting in a plaintext string. I've commented out what string values should result in. In the following example, the uncommented X86_CODE64 instructions are emulated but only result in hpe.com when I read from the stack address. (Hint: To view output, run strings on asdf.txt) I would expect to see apple.com and hpe.com

Question

Based on the code below, is there something I'm doing incorrectly / not doing at all that would result in the following code snippets to not decrypt the strings appropriately?

Disclaimer: This is my first time using Unicorn, so if I'm not articulating clearly or having some trouble explaining, I apologize in advance!

#!/usr/bin/python

from __future__ import print_function
from unicorn import *
from unicorn.x86_const import *

# code to be emulated
# Strings should include apple.com and hpe.com
X86_CODE64 = b'\xc7D$<\xa9GY\x01\xc7D$@\xa2XQ/\x8bD$<\x8aD$8\x84\xc0u\x19H\x8b\xcb\x8bD\x8c<5\xc17</\x89D\x8c<H\xff\xc1H\x83\xf9\x02r\xeaE3\xc0H\x8dT$<H\x8b\xcf\xe8<\xd2\xfe\xff\x88]\xa4\xc7E\xa8\x86/\x00v\xc7E\xac\x82q\x13u\xc7E\xb0\x8a_p\x1a\x8bE\xa8\x8aE\xa4\x84\xc0u\x19H\x8b\xcb\x8bD\x8d\xa85\xe7_p\x1a'

# Strings should be svchost.exe
#X86_CODE64 = b"\xba\xe7_p\x1a\xc7D$|\x94)\x13r\xc7E\x80\x88,\x044\xc7E\x84\x82'\x15:\x89U\x88\x8bD$|\x8aD$x\x84\xc0u\x16H\x8b\xcf\x8bD\x8c|3\xc2"

# Strings should be apple.com
#X86_CODE64 = b'\xc7E\xa8\x86/\x00v\xc7E\xac\x82q\x13u\xc7E\xb0\x8a_p\x1a\x8bE\xa8\x8aE\xa4\x84\xc0u\x19H\x8b\xcb\x8bD\x8d\xa85\xe7_p\x1a'


# Set up Unicorn
ADDRESS = 0x10000000
STACK_ADDRESS = 0x90000
mu = Uc(UC_ARCH_X86, UC_MODE_64)
mu.mem_map(ADDRESS, 4 * 1024 * 1024)
mu.mem_map(STACK_ADDRESS, 4096*10)

# Write code to memory
mu.mem_write(ADDRESS, X86_CODE64)
# Initialize Stack for functions
mu.reg_write(UC_X86_REG_ESP, STACK_ADDRESS + 4096)
mu.reg_write(UC_X86_REG_EDX, 0x0000)

# Run the code
try:
    mu.emu_start(ADDRESS, ADDRESS + len(X86_CODE64), timeout=10000)
except UcError as e:
    pass

#a = mu.mem_read(ADDRESS, 4 * 1024 * 1024)
#print(a)
b = mu.mem_read(STACK_ADDRESS, 4096*10)

with open('asdf.txt', 'ab') as fp:
    fp.write(b)


Solution

  • There are few problems with this code.

    First of all you probably never want to swallow all the exceptions as you do by writing pass in your except at least on the top level. At least it would be good to write them to the console just for the sake of knowing if anything unexpected happened. If you would do that you would notice that unicorn is throwing an Invalid memory fetch (UC_ERR_FETCH_UNMAPPED) during the execution of the code.

    If you would analyze the bytes you would notice there's a strange call in the middle of the first code

    40: e8 3c d2 fe ff          call   0xfffffffffffed281
    

    This call is right after decrypting the hpe.com and unicorn stops executing the code and never gets to the second part of the code. There's probably a better way to handle this in unicorn, but for now lets just nop the call (replace 5 bytes with 5x\x90). This would still not produce the expected apple.com string as this code has more problems. The second part (after the call) is not using RSP but RBP and you are not setting it in your code.

    So we need to add that:

    mu.reg_write(UC_X86_REG_EBP, STACK_ADDRESS + 4096)
    

    And here's another problemy. You are setting unicorn for 64bit, yet you initialize the 32-bit registers - ESP, EDX. Is this on purpose? In your case it's probably not a problem but you probably should initialize 64-bit regs.

    After adding RBP to be set to some stack address, you still won't see the second string as the code is kind of cut too early. The last instructions are read & xor

    6a: 8b 44 8d a8             mov    eax,DWORD PTR [ebp+ecx*4-0x58]
    6e: 35 e7 5f 70 1a          xor    eax,0x1a705fe7
    

    but there's no store, no increment to the next part and no loop.

    Maybe you copy too little bytes. If we add those missing bytes so: 89448da8 for store (mov DWORD PTR [rbp+rcx*4-0x58],eax), 48ffc1 for inc rcx, 4883f903 for cmp rcx, 0x3 and lastly 72ea for jb -0x16.

    So in total your first code misses the following bytes 89448da848ffc14883f90372ea (+ nop the call) and with that

    ❯ python3 program.py
    ❯ strings asdf.txt
    apple.com
    hpe.com

    you get what's expected.

    Briefly checked the 2nd and 3rd code and it appears there no call but they are missing the store, inc & loop part too.