cpu-architecture intel x86-16 memory-segmentation

How can the Intel 8086 access the entirety of the address space at a given time when using memory segmentation?

The intel 8086 has a 20-line address bus. So, it can address 2^20 addresses. However, as it is easier to work with 16-bit words, the 8086 uses only 16 bits to access the memory addresses.

How is it possible to use 16 bits to address 2^20 addresses, you ask? Using the segment:offset approach. The segment base address is the 16-bit word denoting the starting address of a memory segment, and the offset is another 16-bit word denoting the distance of the relevant memory address from the segment base address.

Four zeroes are appended at the end of every base address to make them 20-bit words, because ultimately we do need 20-bit addresses to address the memory locations.

Now, the address space is divided into 4 memory segments. Each memory segment is 64KiB (2^16) in size is associated with a base segment address. So, if there is no overlap between the memory segments, using the segment:offset addressing scheme, we can access 4*65536=2^18 unique memory locations at a time.

Questions:

If we had used 20 bits to address the memory locations in a straightforward manner, we would have been able to access 2^20 memory locations. However, in this case, we are only able to access 2^18 memory locations at a time. The processor can now deal with the words better (because they are 16-bits), but the accessible address space has decreased. This appears to be a trade-off. Is my understanding correct?
When the memory segments overlap, some memory locations fall within two/multiple memory segments. Say, the data segment and the code segment overlap. Won't that lead to conflicts? For example, if the processor writes something to a memory location in the data segment, and that memory location is part of the code segment too, then if the processor later wants to fetch the next instruction and reads data from that memory location, it won't read code segment data; it will just read data that it previously overwrote.

Solution

Now, the address space is divided into 4 memory segments.

Saying "divided into" is misleading. The 20-bit physical address space consists of 65536 different segments, each of size 65536 bytes, at 16-byte intervals in memory, and they overlap. So segment 0x0000 occupies physical addresses 0x00000-0x0ffff, segment 0x0001 occupies physical addresses 0x00010-0x1000f, and so on.

When you access memory with a machine instruction, the segment part of the address is taken from one of the 4 segment registers (CS, DS, ES, SS). So in that sense, you're correct that means there are a maximum of 2^18 bytes that you can access without modifying the segment registers.

However, should you want to access some other part of the 2^20-byte address space, you can do so simply by loading a new segment value into one of the segment registers (usually DS or ES). This is a routine part of x86-16 programming; it shouldn't be thought of as some unusual thing. On the original 8086, moving a new value into a segment register is an inexpensive operation; for instance, mov es, cx is the same 2 clock cycles as mov bx, cx. (If it's already prefetched: 8088 performance is normally limited by code-fetch since every memory access takes 4 cycles.) As Raymond Chen reminds me, one can even load a segment register and a general-purpose register from a far pointer in memory with a single LDS/LES instruction.

Thus I think it's likewise misleading to think of the rest of memory, beyond the 4 segments corresponding to the current values of CS, DS, ES, SS, as being "not accessible". All of memory is accessible to you at any time; it's just that if you want a byte that isn't located in one of those 4 segments, then accessing it requires the extra step of loading a segment register. That just means that it needs an extra instruction, or a different but slower instruction.

So to your point (1), it's true that there's a cost to be paid for being able to use 16-bit offset addresses. But I wouldn't think of that cost as being that "only 2^18 bytes are accessible". Rather, I'd say the cost is merely that some accesses are somewhat more expensive in execution time and code space, because they require loading a segment register first.

It's somewhat analogous to the previous generation of 8-bit CPUs with 16-bit address spaces, like the Intel 8080. There, to access memory, you have to load the two 8-bit halves of your 16-bit address into two separate 8-bit registers (say B and C, likely requiring two or more instructions), and then execute a load instruction like LDAX B which gets the memory address from the pair BC, concatenating the two 8-bit values into a 16-bit value. But that doesn't mean that only 2^8 bytes of memory are "accessible".

Likewise, on the 8086, you could load the 16-bit segment and offset parts of an arbitrary 20-bit address into DS and BX respectively, and then MOV AX, [BX] gets its memory address from the pair DS:BX, via the shift-and-add computation (DS << 4) + BX. So really the only difference is how the address is computed from the register pair: by shift-and-add instead of simple concatenation.

As to (2), this isn't really an issue; it's always the program's responsibility to know where its code and data are located. So let's imagine a program with 4 KB of data, in segment 0x1000, and 4 KB of code, in segment 0x1100. Then presumably it sets DS = 0x1000 and CS = 0x1100. This means that only offsets 0x0000-0x0fff in segment DS correspond to actual data, and so when accessing data, your program should only access those addresses.

Sure, if you do MOV [BX], AX with BX = 0x1234 then you're going to overwrite some code (because 0x1000:0x1234 is the same physical address as 0x1100:0x0234), and yes, future instruction fetches from that address would execute the new contents. But, well, don't do that. If your program ever tries to access anything outside offsets 0x0000-0x0fff, expecting that it's part of the program's data, then it has a bug and you should fix it. (An exception, of course, would be intentional self-modifying code; in that case, the ability to overwrite your code is a feature.)

So yes, you can shoot yourself in the foot with this mechanism, but in some ways it's actually a little safer than if we had a flat address space. If you can arrange that your data segment doesn't overlap any actual code (say, by putting code in segment 0x1000 and data in segment 0x1100), then MOV [BX], AX can't overwrite code, unless you first explicitly load DS with some other value than 0x1100.