When decompiling code from arm64, how can one know if an unconditional branch instruction b
is a branch to a label in the same function and not to some other function?
How do state of the art decompilers recognize if a branch target is still in the function or is it a new function? Do they rely on the branch target's value and see if it lands on a different section in the TEXT
segment?
What about branch targets that are in the same sections but are still considered new functions? Is there a rule of thumb for ARM64 saying that if a branch target is too far by some threshold from the current address it's considered a new boundary thus a new function? Like in x86 where you have different encodings for far jump and short jump, where short jump may be considered a label in a function and far jump probably not.
I might add that my target binaries I'm inspecting right now are Machos written in Objective-C, and I try to validate my findings using Ghidra, so it might be using some more heuristics like seeing if a jump target is in the __stubs
section or the __objc_stubs
section, or even analyzing block structures to identify more procedures (Although from Ghidra decompilation the last point seems like it doesn't identify these structures)?
When decompiling code from arm64, how can one know if an unconditional branch instruction b is a branch to a label in the same function and not to some other function?
You can't really know. You can at best make an educated guess. In my experience, even state-of-the-art decompilers are quite bad at that. The good ones allow the user to manually override such detection.
The core problem is that there is no concept of a "function" at the assembly level. Labels that are exported to higher-level languages like C are expected to conform to a certain ABI, but for anything else, all bets are off. And even with exported things, if you're up against a malicious author (consider: malware), then they may choose to break this contract as well, in order to obscure how the binary works. But even with non-malicious binaries, if someone uses -O3 -flto -moutline
that Apple supports for arm64 targets, you'll be in a world of hurt.
Some heuristics that you can use though:
Has x30 been saved somewhere?
Usually this will happen by an stp x29, x30, [sp, ...]
instruction, and go along with decreasing sp
. If this is the case, then with b
you're likely looking at a branch within the same function. If this is not the case, then you don't know whether this is a tail call or a jump within the same function.
Assumption: x30 is used as link register. An obfuscator could use something else, e.g. by means of adr x17, ...; b _func
for the function call, and then x17
would be the link register. This could also be randomised per-function.
Are there any functions in between the current address and the jump target?
Maybe your binary has symbols, maybe LC_FUNCTION_STARTS, maybe your decompilation has already identified function units, etc.
Assumptions: functions have all their basic blocks next to each other, and aren't fragmented and spread out. Usually this is the case, but again, this assumption could be broken as part of an obfuscation technique.
Is the jump target jumped to from any other function?
If there are other callsites from code that you have already determined belongs to a different function, then the jump target can't belong to either of those functions.
Caveat: function outlining, see the next point.
Does the called code maintain the ABI in and out that you expect from a function?
ABI violations should be trivial to determine. ABI conformance would need something like explicit register spilling to give you confidence. But if it conforms to the expected ABI, then it most likely is its own function. If not, then it's probably either part of the function that jumps to it, or it's code that was outlined.
But none of these are guaranteed to hit, and there will always be cases where both options remain possible.