I'm working on an x64 assembler (just 64 bits, at least for now), and I've gotten decently far (I have support for pretty much all instructions, including most extensions), but I have some pretty hefty tech debt in the REX and MOD/RM regions, and I'm looking to remedy that.
Let's take this as an example - it's the function invoked when I need to calculate the REX prefix of a VEX instruction. Frankly, it's hideous: a bunch of switch
es and fixup if
s which were useful during development but have become a hinderance.
How can I clean this up? Specifically, is there some behaviour in x64 which I can use to simplify the logic in here? (I'm using the VEX version as an example, but I'm looking for a solution which could be applicable to other REX prefixes, ie. EVEX).
In my current version I'm using the instruction encoding, which can be seen in the Intel x64 docs, or in FC's instruction collection, which has been scraped from the Intel PDFs. I'm additionally encoding registers, in order, as they appear, and encode them into binary (1 = register is extended, 0 = not extended), but together with all the fixup if
s its just plain ugly.
I've thought about adding some subset encodings, which would aim to solve the issue of fixup if
s, but I'm not sure I could cover them all.
Is there just a better way to do this? If so, what is it? (NOTE; I've opted to not post this on CR stack exchange since I'm trying to solve an issue related to x64 encoding, not just my code.)
I solved it by dividing the problem into several parts.
First, I specified the data structure Ii which describes the instruction.
Then I defined macros that convert the properties of the instruction into this structure, for instance IiAssembleAVX. Their names start with Ii...
Finally, I have prepared a description for each instruction using these previously defined macros, for instance VADDPD.