I've been working on the Lab1 of MIT6.828. I've got a question about the code used for switching to protection mode. Here is the assembly code
# Switch from real to protected mode, using a bootstrap GDT
# and segment translation that makes virtual addresses
# identical to their physical addresses, so that the
# effective memory map does not change during the switch.
lgdt gdtdesc
movl %cr0, %eax
orl $CR0_PE_ON, %eax
movl %eax, %cr0
# Jump to next instruction, but in 32-bit code segment.
# Switches processor into 32-bit mode.
ljmp $PROT_MODE_CSEG, $protcseg
When executing code movl %eax, %cr0
,the IP register contains the physical address instruction ljmp $PROT_MODE_CSEG, $protcseg
However, after the execution of movl %eax, %cr0
, the CPU is running on protection mode, but the value of CS and IP register does not change.
So, after the executiong of movl %eax, %cr0
, it must fetch the instruction using the values of CS and IP register, how could it work?
(I mean on protection mode, cpu addressing instructions using the segment selector stored in CS register, but the now the value of CS register is not a valid segment selector, so it can't get the right physical address using an invalid segment selecotr)
I've read the Intel architecture manual, but I can find the answer.
The segment base address (and other stuff associated with a segment) only change when you write the associated segment register, in a way that depends on what mode you were in when the write happens. (This is why "unreal" mode works).
Until ljmp
, nothing writes CS so the CPU is still decoding in 16-bit mode, from the same CS:IP resolving to the same linear address as if that mov
to cr0
hadn't had the protected-mode bit set.
Writing CR0 doesn't reinterpret CS as a segment selector and index the GDT with it, or any of the other segment registers.
Intel calls the segment register internals a "descriptor cache", but they're not really like the normal caches: they never invalidate themselves (e.g. in power-saving sleep states) and reload. They're more like registers and are part of the architectural state: they keep whatever value they were written with, regardless of whether the GDT / LDT entry they index (if any) matches their state or even exists. See also https://retrocomputing.stackexchange.com/questions/27035/how-can-a-32-bit-x86-cpu-start-with-reset-vector-0xfffffff0-even-though-it-start/27038#27038