I am currently trying my hand at writing a program that decodes x86-64 instructions into assembly, but I am stuck on determining the default memory/register operand size when dealing with instructions whose operands are not explicit (i.e. they change size based on the current operating mode).
I am aware of the fact that each memory segment has its own descriptor table which describes, among many things, the EFER.LMA
, CS.L
, and CS.D
fields. Different combinations of these, signal which operating mode a specific segment is in.
I know that each operating mode calls for its own data/address default size, which are subject to change based on operand/address override prefixes in the instruction's encoding.
My question is how do I know which segment's operating mode to check in order to determine the operand size for a given instruction?
I'll give an example:
The add
instruction variant with opcode = 0x03
has the symbols Gv, Ev
.
According to the AMD x64 manual volume 3, G/E means a general purpose register, and the v symbol means "A word, doubleword, or quadword (in 64-bit mode), depending on the effective operand size."
I know the effective operand size is the default operand size specified by the operating mode of some segment (ignoring size overrides), but which segment would I check here? The page that describes the add
instruction in detail does not mention a specific segment, so I'm assuming this is something that is not provided on an instruction by instruction basis (except for edge cases).
Here's what I have pieced together so far as to how to approach this:
Assume the instruction targets a default segment. Maybe it's DS for register operands and SS for memory operands. If there is a segment override prefix, use this segment instead.
Fetch this segment's descriptor entry.
Use the fields in the descriptor entry to determine the effective operand size. The effective operand size will be the default unless there is a size override prefix.
This will be the size used for the operand.
Is this approach correct? Feel free to correct anything in my approach/understanding. Thank you very much.
Only CS segment affects the operand and address sizes. DS, SS, ES, FS, and GS have no effect on either.
There are 3 different modes: 16-bit, 32-bit, and 64-bit. They are selected on D and L bits in the descriptor loaded into CS:
66
before the instruction changes operand size to 32-bit, prefix 67
changes address size to 32-bit. 64-bit sizes are not available.66
changes operand size to 16-bit, prefix 67
changes address size to 16-bit. 64-bit sizes are not available.66
changes operand size to 16-bit, REX.W prefix changes operand size to 64-bit, prefix 67
changes address size to 32-bit. 16-bit address size is not available.EFER.LMA is a global value (not per-segment), it affects if the CPU pays attention to the L bit. If EFER.LMA=0, L bit is ignored (assumed 0), and 64-bit mode is not accessible.
D bit in data segments does not affect the default operand size. It only affects the maximum address in grow-down segments (is it 0xFFFF or 0xFFFFFFFF), and for SS segment it affects whether push
and pop
will use SP or ESP register. In 64-bit mode, this register is always RSP.
Note, that if the code runs under OS, descriptor table is usually not accessible for user programs. You have to use some OS-specific way to ask, which mode the other process is running - if such way is even provided by the OS. Most programs, however, start in one mode and never change it. Starting mode is usually identified in the executable file's header.