assemblyintelx86-16octalinstruction-encoding

What is the proper octal representation of the encoding of the operand register in intel 8086?


The classical explanation of intel opcodes using octal says this:

   As an example to see how this works, the mov instructions in octal are:

210 xrm         mov Eb, Rb
211 xrm         mov Ew, Rw
212 xrm         mov Rb, Eb
213 xrm         mov Rw, Ew
214 xsm         mov Ew, SR
216 xsm         mov SR, Ew

The meanings of the octal digits (x, m, r, s) and their correspondence to the
operands (Eb, Ew, Rb, Rw, SR) are the following:

The digit r (0-7) encodes the register operand as follows:
REGISTER (r):                0   1   2   3   4   5   6   7
   Rb = Byte-sized register AL  CL  DL  BL  AH  CH  DL  BH
   Rw = Word-sized register AX  CX  DX  BX  SP  BP  SI  DI

Why is the 6th digit for Rb DL instead of DH, breaking the high byte pattern?

While I'm asking this question, is there a more up to date octal explanation of the 8086 intel opcodes that was not written in the 90s?


Solution

  • That's a typo; DL appears twice, DH appears nowhere in that table.

    You're right, it follows the pattern of 4 low then 4 high half registers, as you can see by assembling mov dl, 0 and mov dh, 0 where the destination register-number is the low 3 bits of the opcode. Pick any popular non-buggy assembler, they all get this right. (NASM is good, clang and the GNU assembler are also decent choices albeit GAS has less nice error messages.)

    is there a more up to date octal explanation of the 8086 intel opcodes that was not written in the 90s?

    Intel's manual is up to date, but aims for precision over clarity and readability. It sometimes doesn't mention patterns that exist in the encodings (like how the low 2 bits of most opcodes distinguish width and direction; 8 vs. 16/32/64-bit and memory source vs. destination).

    https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers is quite good, and does have a correct table of register numbers. It's for x86-64 so it includes the extra bit which a REX prefix can supply. (Also, the mere presence of a REX prefix changes the meaning of the 8-bit register numbers 4 through 7 from AH-BH to SPL-DIL, the low-8 of RSP through RDI. So you can't do mov ah, r8b because that would need a REX prefix for R8, but that makes AH inaccessible.)

    Most documentation uses hex or decimal, or groups of binary digits, because with REX, VEX, and EVEX prefixes supplying additional register-number bits, it's no longer always groups of 3 bits. (And because octal isn't widely used anymore.)