assemblyx86memory-segmentation

What does : (colon) mean in x86 addressing modes, between ES and the rest?


In this assembly instruction

mov ax, es:[bx]

what does the : do?


Solution

  • what specifically does the : do?

    The ":" doesn't "do" anything, in the same way that "." doesn't "do" anything in most high level programming languages. A ':' is used with an instruction of the form <segment register> : <address expression>. By default, all x86 instructions have a "default segment selector" which is used to determine the address indicated by an instruction's "memory operand". This is usually either "ds" or "ss", depending on the instruction. An instruction may specify any of the CS,DS,ES,SS, FS, and GS segment registers, however, by specifying an appropriate "instruction prefix byte" in the instructions binary encoding.

    In 16 bit "real mode" programs the value in a segment register is used to determine the "higher order bits" of a memory address. It get's combined with the memory address specified in the instruction to generate the actual address referenced by the instruction. This allowed programs running on 16 bit hardware to have access to larger than 16 bit memory spaces, provided they could group memory into 4k chunks that could be accessed relative to a "segment selector" register.

    In 32 bit programs the segment selector is actually an index into a structure that describes a dynamic mapping, including an offset and a size. The address is computed by combining the information present in the indexed structure with the memory operand present in the instruction.

    Most of the time, in 32 bit programs, most segment registers point to structures that specify the entire 32 bit address space. The primary exception is the "fs" register, which specifies an offset and size that maps to a special data structure defined by the operating system. It is used as one of the mechanisms for communication between kernel space and user space. It usually contains access to all the "user space visible" attributes of the Kernel's representation of the current "process or thread".

    64 bit programs completely eschew segment registers. All segment registers except FS and GS are defined to have no effect, and behave as if they mapped the entire user space. The FS register is usually used to provide access to the current "32 bit context" of the executing program. The "GS" register is usually used to provide access to the current "64 bit context". This allows 32 bit programs to run on 64 bit systems, but also gives the 64 bit kernel (and the mapping layer between 32 bit process and 64 bit processes) access to the 64 bit context it needs to work.

    So, to answer your original question:

    Probabilisticly (given no knowledge about the mode of the processor or the operating system), the instruction:

    mov ax, es:[bx]
    

    is actually equivalent to:

    mov ax, [bx]
    

    However, the fact that it uses 16 bit registers indicates that it might be a real mode program, in which case it may mean:

    mov ax, [<addr>]
    

    where addr == (es << 4) + [bx]