I have asm file, that was produced with IDA Pro. All of its functions looks almost like this.
; =============== S U B R O U T I N E =======================================
release ; DATA XREF: attribute_manager_create+78↓o
; attribute_manager_create+7C↓o ...
var_30 = -0x30
var_24 = -0x24
arg_0 = 0
arg_4 = 4
MOV R7, R0
LDR R0, [R0,#0x34]
SUB SP, SP, #0x14
MOV R9, R3
LDR R3, [R0]
MOV R5, R1
MOV R8, R2
LDR R0, [R7,#0x30]
ADD R6, SP, #0x30+var_24
LDR R3, [R0,#4]
MOV R4, R0
B loc_7A7C
; ---------------------------------------------------------------------------
loc_7A70 ; CODE XREF: release+5C↓j
LDR R3, [SP,#0x30+var_24]
CMP R3, R5
BEQ loc_7AB4
loc_7A7C ; CODE XREF: release+38↑j
LDR R3, [R4]
MOV R1, R6
MOV R0, R4
CMP R0, #0
BNE loc_7A70
loc_7A94 ; CODE XREF: release+A0↓j
LDR R3, [R4,#8]
MOV R0, R4
LDR R0, [R7,#0x34]
LDR R3, [R0,#0xC]
ADD SP, SP, #0x14
POP {R4-R9,PC}
; ---------------------------------------------------------------------------
loc_7AB4 ; CODE XREF: release+44↑j
LDR R3, [SP,#0x30+arg_4]
STR R3, [SP,#0x30+var_30]
MOV R2, R9
LDR R3, [SP,#0x30+arg_0]
LDR R6, [R5,#4]
MOV R1, R8
MOV R0, R5
B loc_7A94
; End of function release
I want to parse this file and get a dictionary where the key will be the name of the function and the value will be a string that is formed from the instructions combined together. I will explain in more detail.
I have a dictionary in which each Arm instruction corresponds to a specific letter.
arm_dict = {"MOV": "a","MVN": "b","ADD": "c","SUB": "d","MUL": "e","LSL": "f","LSR": "g","ASR": "h","ROR": "i","CMP": "j","AND": "k","ORR": "l","EOR": "m","LDR": "n","STR": "o","LDM": "p","STM": "q","PUSH": "r","POP": "s","B": "t","BL": "u","BLX": "v","BEQ": "w","SWI": "x","SVC": "y","NOP": "z"}
When parsing, you need the instruction to become this letter. For example, the above function in the dictionary should look like this:
{'release': 'randanaavncnvat...'}
If the code contains an instruction that is not in arm_dict, then that instruction is skipped.
I've tried to parse linearly using strings containing "S U B R O U T I N E" and "End of function", but I can't get rid of the instruction operands. I would be glad if someone can provide some sample code or advice.
arm_dict = {"MOV": "a","MVN": "b","ADD": "c","SUB": "d","MUL": "e","LSL": "f","LSR": "g","ASR": "h","ROR": "i","CMP": "j","AND": "k","ORR": "l","EOR": "m","LDR": "n","STR": "o","LDM": "p","STM": "q","PUSH": "r","POP": "s","B": "t","BL": "u","BLX": "v","BEQ": "w","SWI": "x","SVC": "y","NOP": "z"}
FILE_NAME = "ida_output.asm"
result = ""
with open(FILE_NAME) as f:
lines = f.readlines()
for line in lines:
words = line.split()
# if the line is empty, skip it
if not words:
if words[0] in arm_dict:
result += arm_dict[words[0]]
Heres a messy edited version after peter's suggestion:
arm_dict = {"MOV": "a","MVN": "b","ADD": "c","SUB": "d","MUL": "e","LSL": "f","LSR": "g","ASR": "h","ROR": "i","CMP": "j","AND": "k","ORR": "l","EOR": "m","LDR": "n","STR": "o","LDM": "p","STM": "q","PUSH": "r","POP": "s","B": "t","BL": "u","BLX": "v","BEQ": "w","SWI": "x","SVC": "y","NOP": "z"}
FILE_NAME = "ida_output.asm"
def trim(line):
if ";" in line:
return line.split(";")[0]
return line
functions = {}
with open(FILE_NAME) as f:
label = None
lines = f.readlines()
for line in lines:
words = line.split()
# if the line is empty, skip it
if not words:
first = words[0]
if label and first in arm_dict:
functions[label] += arm_dict[first]
elif first[0] != ";" and (not label or (not line[0].isspace() and "=" not in trim(line) and label not in line)):
label = first
functions[label] = ""
Theres lots of potential edge cases it could fail, but it should do pretty alright.