I'm writing an 8086 assembler that takes instructions and produce 8086 machine code. I use the "Intel 8086 User Manual" as a reference.
To make it clear I will explain the situation. Let's say I wanna assemble this instruction mov ax, bx
. I will look up the manual to find that when the operands of mov
are 2 16bit registers, the opcode for mov
is 0x89
and to specify the operands (the source and the destination), mov
, in this case, is followed by a ModRegRm byte that specifies the source and the destination, which, in this case, is 0xd8
. This byte in binary = 11011000
.
The Mod is 2 bits and the Reg, Rm are 3 bits each. so, Mod = 11, Reg = 011, Rm = 000. It's straight forward here, but there is something i don't understand, which is the addressing modes and the displacement.
Look at the table and the three following instructions and their machine code.
mov [bx+0x6], ax ;894706
mov [bx+0xbf],ax ;8987BF00
mov [bx+0xffff],ax ;8947FF
Am I wrong in assuming that the displacement length in each instruction is 8bit, 8bit, 16bit, respectively?
I think I'm right because it's obvious, 0x6
and 0xbf
are 1 byte and 0xffff
is two bytes.
The question is, why the MOD field in the second instruction is 10b or 0x02
instead of 01b or 0x01
? It should be 0x01
because the displacement is 8bit displacement, isn't it? And why the MOD is 0x01
in the third instruction even though the displacement is 16bit? and why the assembler ignored the rest of the displacement and captured only 1 byte?
The size of the displacement depends on the "MOD" field (e.g. 8 bits if MOD=001b, 16 bits if MOD=010b) and is sign extended to the intended size.
This means that an instruction like mov [bx+6], ax
could be encoded as mov [bx+0x0006], ax
(with a MOD=010b and a 16 bit displacement) or it could be encoded as mov [bx+0x06], ax
(with a MOD=001b and a 8 bit displacement).
In the same way, mov [bx+65535],ax
could be encoded either way (with 8 bit displacement or 16 bit displacement); because 0xFF can be sign extended to 0xFFFF.
However; mov [bx+191],ax
can't be encoded as an 8 bit displacement, because when 191 (0xBF) is sign extended it becomes 0xFFBF, which is not equal to 191. It must use a 16 bit displacement.
Essentially; if the highest 9 bits of the full 16-bit displacement are all the same (all clear for values 0x0000 to 0x007F, or all set for values 0xFF80 to 0xFFFF) it can be encoded as an 8-bit displacement or a 16-bit displacement; otherwise it must use a 16-bit displacement.
When there's a choice between different encodings; a good assembler will choose the smallest possibility (and use an 8 bit displacement because it makes the instruction 1 byte shorter). An even better assembler may use the larger version if it avoids the need for padding (if following instructions need to be aligned on a certain boundary). For an example consider .align 2
then mov [bx+6], ax
then .align 2
then clc
- with the smaller (3 byte) mov
you have to insert an extra nop
instruction as padding before the clc
to ensure that instruction is aligned on a 2-byte boundary (requested by the align 2
directive), and with the larger (4 byte) mov
you don't (so it's 1 less instruction, but the same number of bytes for the resulting code).