My objective is to do comparative study of a few instruction set architectures.
For each instruction set architecture, how can i find the most commonly used instructions?
This is the steps i am thinking of:
Here is a very good study on x86 machine code statistics: https://www.strchr.com/x86_machine_code_statistics
I have tried below command for disassembling, but it does not seem to disassemble properly. Disassembled code shows some das
instructions, which should not be present in actual code.
ndisasm -b32 -a $(which which)
You can try this, to gather mnemonics from .text section:
objdump --no-show-raw-insn \
-M intel \
-sDj .text $(which *program name*) | # <-- disassemble .text section
sed -n '/<\.text>/, $ p' | # <-- skip raw hex
awk '{$1 = ""; print}' | # <-- remove offsets
sed '1d' # <-- delete annoying <.text> in first line
After that you can either get only mnemonics name, appending awk '{print $1}'
to previous command, or mutating data somehow different.
After all of this add sort | uniq -c
to previous steps.
So my resulting command looked like:
objdump --no-show-raw-insn \
-M intel \
-sDj .text $(which *program name*) |
sed -n '/<\.text>/, $ p' |
awk '{$1 = ""; print}' |
sed '1d' |
awk '{print $1}' | sort | uniq -c
Which prints out frequencies of every mnemonic from program's text section