assemblyx86x86-16gnu-assemblerprotected-mode

What's the difference between .code16 and .code32


I am learning to program a system core of i386 by watching some videos. I've known some procedures about entering protected mode:

In a .code16 file, first I need to open A20 Address Line and changed CR0 register, and then I need to ljmp into a .code32 code.

Now I am wandering the differences between .code16 machine code and .code32 machine code

These are my questions:

  1. Is it valid to use .code16 code in protect mode?
  2. What's the difference between .code16 machine code and .code32 machine code genreated by assembler
  3. I found it is valid to execute .code16 code after I set CR0 register and before ljmp, that's why?
  4. My teacher told me .code16 means "generate code specified to 16-bit mode" and .code32 means "generage code specified for 32-bit mode", so what does that mean?

Sorry for my ignorance, I am a green hand in this field


Solution

  • What's the difference between .code16 machine code and .code32 machine code generated by assembler?

    In 16-bit modes (real mode and 16-bit protected mode) and in 32-bit protected mode, the CPU interprets the bytes of the code differently.

    The main difference is that the meanings of the instruction prefixes 66 and 67 (hexadecimal) are reversed:

    In 16-bit modes, the CPU uses 16-bit registers and constants and i8086-type addressing modes by default. The prefix 66 tells the CPU to use 32-bit registers and constants; the prefix 67 tells the CPU to use i80386-type addressing modes:

    Program bytes   Instruction understood by the CPU
          8b 08     mov cx,[bx+si]
    66    8b 08     mov ecx,[bx+si]
    67    8b 08     mov cx,[eax]
    66 67 8b 08     mov ecx,[eax]
    

    In 32-bit protected mode, it is the other way round:

          8b 08     mov ecx,[eax]
    ...
    66 67 8b 08     mov cx,[bx+si]
    

    "generate code specified to 16/32-bit mode" ... so what does that mean?

    If one line of your program is mov ecx,[eax], the assembler writes 8b 08 in .code32 mode and 66 67 8b 08 in .code16 mode.

    ... because the CPU interprets 8b 08 as mov ecx,[eax] when operating in 32-bit mode and it interprets 66 67 8b 08 as mov ecx,[eax] when operating in 16-bit mode.

    Is it valid to use .code16 code in protect mode?

    I have already written about the "16-bit protected mode".

    Actually, there exists no "16-bit protected mode" but only one single "protected mode". In protected mode, you can create 16- and 32-bit descriptors in the GDT (or LDT).

    To execute 16-bit code in protected mode, you must create a 16-bit code descriptor (in the GDT or the LDT) and perform an ljmp to that code.

    (Executing 16-bit code in protected mode is required to switch a 32-bit CPU from protected mode back to real mode.)

    Note that the descriptors for 16-bit code (and the stack!) must only have a size of 64 KiB and less. This means that you cannot create one single descriptor describing the whole 4 GiB of memory (as it is done for 32-bit code) but it might be necessary to create multiple descriptors for code that is located in different parts of the memory.

    I found it is valid to execute .code16 code after I set CR0 register and before ljmp, that's why?

    Internally, the segment registers (cs, ds ...) seem to be about 80 bits long but only 16 of these 80 bits are visible to the programmer.

    One of the "hidden" bits of the cs register specifies if the CPU executes 16- or 32-bit code. (In protected mode, this bit is read from the GDT or LDT.)

    According to some information I have read when reading about the so-called "unreal mode", the main difference between "real mode" and "protected mode" inside the i80386 CPU seems to be that the "hidden" bits of the segment registers are modified differently in the two modes when changing the value of a segment register. (There are also differences in interrupt handling etc. ...)

    If this is true, setting or clearing bit 0 of CR0 has (nearly) no effect at all until a segment register is changed (by performing ljmp, mov ds,ax ... or an interrupt).