assemblyx86-64cpu-architecturedisassemblyinstructions

How to determine default operand size for instruction decoding x86-64


I am currently trying my hand at writing a program that decodes x86-64 instructions into assembly, but I am stuck on determining the default memory/register operand size when dealing with instructions whose operands are not explicit (i.e. they change size based on the current operating mode).

I am aware of the fact that each memory segment has its own descriptor table which describes, among many things, the EFER.LMA, CS.L, and CS.D fields. Different combinations of these, signal which operating mode a specific segment is in.

I know that each operating mode calls for its own data/address default size, which are subject to change based on operand/address override prefixes in the instruction's encoding.

My question is how do I know which segment's operating mode to check in order to determine the operand size for a given instruction?

I'll give an example:

Opcode entry for implicit operand of add instruction

The add instruction variant with opcode = 0x03 has the symbols Gv, Ev.

According to the AMD x64 manual volume 3, G/E means a general purpose register, and the v symbol means "A word, doubleword, or quadword (in 64-bit mode), depending on the effective operand size."

I know the effective operand size is the default operand size specified by the operating mode of some segment (ignoring size overrides), but which segment would I check here? The page that describes the add instruction in detail does not mention a specific segment, so I'm assuming this is something that is not provided on an instruction by instruction basis (except for edge cases).

Here's what I have pieced together so far as to how to approach this:

  1. Assume the instruction targets a default segment. Maybe it's DS for register operands and SS for memory operands. If there is a segment override prefix, use this segment instead.

  2. Fetch this segment's descriptor entry.

  3. Use the fields in the descriptor entry to determine the effective operand size. The effective operand size will be the default unless there is a size override prefix.

  4. This will be the size used for the operand.

Is this approach correct? Feel free to correct anything in my approach/understanding. Thank you very much.


Solution

  • Only CS segment affects the operand and address sizes. DS, SS, ES, FS, and GS have no effect on either.

    There are 3 different modes: 16-bit, 32-bit, and 64-bit. They are selected on D and L bits in the descriptor loaded into CS:

    EFER.LMA is a global value (not per-segment), it affects if the CPU pays attention to the L bit. If EFER.LMA=0, L bit is ignored (assumed 0), and 64-bit mode is not accessible.

    D bit in data segments does not affect the default operand size. It only affects the maximum address in grow-down segments (is it 0xFFFF or 0xFFFFFFFF), and for SS segment it affects whether push and pop will use SP or ESP register. In 64-bit mode, this register is always RSP.

    Note, that if the code runs under OS, descriptor table is usually not accessible for user programs. You have to use some OS-specific way to ask, which mode the other process is running - if such way is even provided by the OS. Most programs, however, start in one mode and never change it. Starting mode is usually identified in the executable file's header.