When running a boot-loader program on a modern-day x86 processor, the processor will be running in real-address mode. Will its instruction pipelining features be active in real mode, or not?
Yes, the out-of-order core in modern microarchitectures operates basically the same regardless of mode. Most of the difference is in the decoders. See Agner Fog's microarch pdf and other links in the x86 tag wiki for details of how modern CPUs actually do work internally.
It would probably take extra silicon to behave differently in 16bit mode, since it's very similar to 32bit mode with paging disabled, but with a different default address-size and operand-size.
I've read that AMD CPUs are slightly slower when segments have a non-zero base. (Or I guess in 16bit mode: when segment registers themselves are set to non-zero values, since in 16bit mode they're used directly, rather than being selectors for descriptors.)
Keep in mind that many common 16bit idioms like loop
are terrible.
Also, partial-register slowdowns can easily interfere with out-of-order execution if you aren't careful. Intel P6-family and SnB-family CPUs rename partial registers separately, so writing to AX doesn't have a false dependency on the full contents of EAX/RAX. There can be stalls when merging later on CPUs before SnB, or just minor slowdowns on SnB before Haswell.
All other microarchitectures treat mov ax, 5
as a read-modify-write of eax
, so it doesn't break the dependency chain on the old value of ax
. This can be a huge problem for out-of-order execution if you aren't careful.
Read Agner Fog's manuals to learn more.
16bit addressing modes might not perform well, I forget. 32bit code doesn't need them to be fast, and 64bit code can't use 16bit addresses at all. (The address-size prefix in 64bit code means address-size = 32bits.)
VEX-coded instructions (including all AVX and some BMI1 and BMI2 integer instructions like blsr
and pext
) aren't available in real or VM86 mode. This Intel forum topic (dead link1) suggested that may be due to existing software (NTVDM) using the machine code as a trap to protected mode. (i.e. the same illegal operands to LDS/LES that VEX uses). Making VEX-coded instructions still generate #UD
is thus important for backwards compatibility. Michael Petch commented:
0xc4 0xc4 0x60
(Vm version number) and0xc4, 0xc4, 0x58
were in pretty common use in 16-bit code in the mid 90s even before NTVDM. They were commonly used by those of us trying to determine if we were running in SoftPC. Back then they were sparsely documented as BOP codes. Microsoft semi-documented them with the NT Device driver kit back in the mid 90s. This wasn't unsurprising because NTVDM was based on SoftPC. I pulled out the old NT DDK CD and they can be found in the fileISVBOP.h
SSE is still available in real mode, though, if you enable it with the right CR setting.
(VEX/EVEX are available in 16-bit protected mode, but not real or virtual-8086 mode. Is x86 32-bit assembly code valid x86 64-bit assembly code?)
Footnote 1: dead link, not archived in wayback machine. Intel may have just reorganized their forum URLs, but I didn't go looking.