assembly x86 x86-16 cpu-registers memory-segmentation

8086- why can't we move an immediate data into segment register?

In 8086 assembly programming, we can only load a data into a segment register by, first loading it into a general purpose register (or memory) and then we have to move it from there to the segment register.

For example these are both valid

  mov    ax, 5000h
  mov    ds, ax

;;; or if you don't have a spare register, this works in modes other than 64-bit
  push   5000h
  pop    ds

But mov ds, 5000h is not a valid x86 instruction.

Why can't we load it directly from an immediate? Is there any special reason for not being allowed?

Solution

Remember that the syntax of assembly language (any assembly) is just a human-readable way to write machine code. The rules of what you can do in machine code depend on how the processor's electronics were designed, not on what the assembler syntax could easily support.

So, just because it looks like you could write mov DS, 5000h and that conceptually it doesn't seem like there is a reason why you shouldn't be able to do it, it's really about "is there a mechanism by which the processor can load a segment register directly from an immediate value?"

In the case of 8086 assembly, I figure that the reason is simply that the engineers just didn't create an electric path that could feed a signal from the memory I/O data lines to the lines that write to the segment registers.

Why? I have several theories, but no authoritative knowledge.

The most likely reason is simply one of simplifying the design: it takes extra wiring and gates to do that, and it's an uncommon enough operation (this is the 70's) that it's not worth the real estate in the chip. This is not surprising; the 8086 already went overboard allowing any of the normal registers to be connected to the ALU (arithmetic logic unit) which allows any register to be used as an accumulator. I'm sure that wasn't cheap to do. Most processors at the time only allowed one register (the accumulator) to be used for that purpose.

As far as the brackets, you are correct. Let's say memory position 5000h contains the number 4321h. mov ax, 5000h puts the value 5000h into ax, while mov ax, [5000h] loads 4321h from memory into ax. Essentially, the brackets act like the * pointer dereference operator in C.

Just to highlight the fact that assembly is an idealized abstraction of what machine code can do, you should note that the two variations are not the same instruction with different parameters, but completely different opcodes. They could have used – say – MOV for the first and MVD (MoVe Direct addressed memory) for the second opcode, but they must have decided that the bracket syntax was easier for programmers to remember.