assemblyx86nasmbootloadermemory-segmentation

How does an assembler find the offset of a label without knowing the value of the segment register?


I am learning about some simple x86 bootloader code and having some trouble understanding how the assembler (nasm in my case) calculates the offsets of labels.

It is my understanding that a data label like letter below represents the offset of the following byte within the data segment. It is also my understanding that in an instruction like mov al, [letter], the label is implicitly interpreted as ds:letter. This means that the assembler needs to know the offset of label within the data segment in order to reference the correct address.

What I don't understand is how the assembler calculates this offset. To figure out the offset, it would need to know where the start of the data segment is. But it doesn't know this since it doesn't know the value of ds at the assembly stage. So how is the offset calculated?

[org 0x7c00]
bits 16

; initialise data segment register
mov ax, 0
mov ds, ax

; print letter
mov al, [letter]
mov ah, 0x0e
int 0x10

end:
jmp $


; data label
letter:
db 'A'


times 510 - ($ - $$) db 0
db 0x55, 0xaa

This code prints the letter A as expected. But I note that if I change ds to some other value, say 10, it doesn't print anything. So clearly the assembler is calculating the offset independently of the value of ds (so there's no guarantee ds:letter would reach the right address) -- I just don't know how it's calculating it.


Solution

  • With help from users ecm and rcgldr, I've figured out the source of my confusion, so I'll summarise the answer to my question below in case others have the same confusion.

    My mistake was to think that the segment registers DEFINE where a segment starts, and that the assembler therefore would need to know the value of a segment register before calculating the offset.

    This is not technically accurate. When we write our program, we need to have some plan about where we want the segments to start, and where the bytes of our program should reside relative to the start of these segments. That is, we should already know the segment and program layout before assigning segment registers and offsets in the program.

    So the offsets we assign are not dependent on the values we assign to the segment registers, but, rather, the offsets we assign AND the values we give the segment registers are BOTH dependent on our plan for the segment layout which we already drew up. It is up to us to give both of these the right values in order to implement our plan and allow the processor to get to the addresses we intended. We assign to the segment registers to get the starting addresses of the segments right, and we assign offsets using various directives like org and section to get the positions of the bytes within the segments right.