assemblyx86qemuosdevmultiboot

QEMU doesn't reboot my OS after resetting via the 8042 PS/2 controller


@MichaelPetch has rewritten the entire question to reduce it to a specific problem that should be easily reproduced. The original question focussed on a problem encountered doing OS development in 64-bit long mode. The code was attempting to use 8042 PS/2 controller to reboot a machine but it failed to work on QEMU, although it did work in BOCHS. The original code can be found in this Github project.

Michael determined that the problem was not long mode specific. The problem space was substantially reduced to better illustrate the core issue.


For this demonstration I am:

The code for this demonstration is as follows:

bootloader.asm:

[BITS 32]

section .mboot
mboot_header_start:
    dd 0xe85250d6
    dd 0
    dd mboot_header_end - mboot_header_start
    dd 0x100000000 - (0xe85250d6 + 0 +(mboot_header_end - mboot_header_start))

    align 8
mboot_inforeq_start:
    dw 1
    dw 0
    dd mboot_inforeq_end - mboot_inforeq_start
    dd 2
    dd 6
    dd 8
mboot_inforeq_end:

    align 8
mboot_end_start:
    dw 0
    dw 0
    dd mboot_end_end - mboot_end_start
mboot_end_end:

mboot_header_end:

section .text
global _start
_start:
    mov word [0xb8000], (0x5f << 8) | 'B'
    mov word [0xb8002], (0x5f << 8) | 'O'
    mov word [0xb8004], (0x5f << 8) | 'O'
    mov word [0xb8006], (0x5f << 8) | 'T'

    ; Delay after writing to the screen so it appears for a bit of time before reboot
    mov ecx, 0xfffff
delay:
    loop delay

    ; Wait until the 8042 PS/2 Controller is ready to be sent a command
wait_cmd_ready:
    in al, 0x64
    test al, 00000010b
    jne  wait_cmd_ready

    ; Use 8042 PS/2 Controller to reboot the machine
    mov al, 0xfe
    out 0x64, al

    ; If this is displayed the reboot wasn't successful. Shouldn't get this far
    mov word [0xb8000+160], (0x5f << 8) | 'N'
    mov word [0xb8002+160], (0x5f << 8) | 'O'
    mov word [0xb8004+160], (0x5f << 8) | 'R'
    mov word [0xb8006+160], (0x5f << 8) | 'E'
    mov word [0xb8006+160], (0x5f << 8) | 'B'

    ; Infinite loop to end
hltloop:
    hlt
    jmp hltloop

link.ld:

ENTRY(_start);

kern_vma = 0x100000;

SECTIONS
{
    . = 0x500;
    .boot :
    {
        *(*.mboot*)
    }
    . = kern_vma;
    .text ALIGN(4K) :
    {
         *(*.text*)
    }
    .bss ALIGN(4K) :
    {
         *(.bss)
    }
}

My Linux build script is:

#!/bin/sh

ISO_DIR="isodir"
ISO_NAME="myos"
GRUB_CFG="grub.cfg"
KERNEL_NAME="bootloader"

nasm -f elf32 bootloader.asm -o bootloader.o
ld -m elf_i386 -T link.ld bootloader.o -o $KERNEL_NAME.elf

mkdir -p $ISO_DIR/boot/grub
cp $KERNEL_NAME.elf $ISO_DIR/boot/

echo 'set timeout=2'                          >  $ISO_DIR/boot/grub/$GRUB_CFG
echo 'set default=0'                          >> $ISO_DIR/boot/grub/$GRUB_CFG
echo 'menuentry "My Kernel" {'                >> $ISO_DIR/boot/grub/$GRUB_CFG
echo '  multiboot2 /boot/'$KERNEL_NAME'.elf'  >> $ISO_DIR/boot/grub/$GRUB_CFG
echo '}'                                      >> $ISO_DIR/boot/grub/$GRUB_CFG

# build iso image
grub-mkrescue -o $ISO_NAME.iso $ISO_DIR/

The Problem

When I run the build script and run it in QEMU with this command:

qemu-system-i386 -cdrom myos.iso

GRUB boots the kernel and BOOT is properly displayed with white on magenta attributes in the upper left hand corner of the window. It should briefly wait before rebooting the machine load GRUB and repeat the cycle.

It doesn't do what is expected. It displays BOOT as it should but QEMU appears to sit and do nothing which is not what I expect.

If I run QEMU with the additional option -d int I find that the machine appears to be in a perpetual loop consisting of an invalid opcode exception (v=06), a general protection fault (v=0d), and ends with a double fault (v=08). The output usually looks something like:

     0: v=06 e=0000 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=00000000
EAX=00000000 EBX=00000000 ECX=00000010 EDX=000f171d
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000fc0
EIP=000f0000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6080 00000037
IDT=     000f60be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=0000007f CCO=ADDB
EFER=0000000000000000
check_exception old: 0xffffffff new 0xd
     1: v=0d e=0032 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=0000
0000
EAX=00000000 EBX=00000000 ECX=00000010 EDX=000f171d
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000fc0
EIP=000f0000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6080 00000037
IDT=     000f60be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=0000007f CCO=ADDB
EFER=0000000000000000
check_exception old: 0xd new 0xd
     2: v=08 e=0000 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=0000
0000
EAX=00000000 EBX=00000000 ECX=00000010 EDX=000f171d
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000fc0
EIP=000f0000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6080 00000037
IDT=     000f60be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=0000007f CCO=ADDB
EFER=0000000000000000
check_exception old: 0x8 new 0xd
check_exception old: 0xffffffff new 0x6

It keeps repeating a similar pattern. What is unusual is that it seems to be stuck in this cycle of exceptions:

0: v=06 e=0000 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=00000000
1: v=0d e=0032 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=0000
2: v=08 e=0000 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=0000

What is causing this problem and how can I fix it so that QEMU will properly reboot and launch GRUB again?


Solution

  • The original Operating System code required effort to determine the root cause of this problem. The example makes it easier for people to potentially spot the culprit. Finding this problem in the original code took some effort.


    The primary problem is that you have placed the Multiboot section (with the Multiboot2 header) at Virtual Memory Address (VMA) 0x500 in the linker script with this:

    SECTIONS
    {
        . = 0x500;
        .boot :
        {
            *(*.mboot*)
        }
        . = kern_vma;
        .text ALIGN(4K) :
        {
             *(*.text*)
        }
        .bss ALIGN(4K) :
        {
             *(.bss)
        }
    }
    

    The VMA and Load Memory Address (LMA) are both set to 0x500 when the .boot section is emitted by the linker. The problem is that GRUB will attempt to load this section at memory address 0x500 when reading your ELF executable. Placing any code and data below 0x100000 is a very bad idea. GRUB will use this area of memory for doing all its boot related tasks including loading your kernel. You may be inadvertently overwriting memory that GRUB is using, potentially putting the machine in an undefined state. It may work on some machines running GRUB and not others. Problems may manifest in other ways.

    The only rules from the Multiboot2 specification about the location of the Multiboot2 header is that it be aligned on a Quadword Word boundary (64-bit) and fall within the first 32,768 bytes of the ELF file. You don't need to place the Multiboot2 header below 0x100000. That is unnecessary. Place it before your code and data above 0x100000. This should work:

    ENTRY(_start);
    
    kern_vma = 0x100000;
    
    SECTIONS
    {
        . = kern_vma;
        .boot :
        {
            *(*.mboot*)
        }
        .text ALIGN(4K) :
        {
             *(*.text*)
        }
        .bss ALIGN(4K) :
        {
             *(.bss)
        }
    }
    

    In your case writing to memory below 0x100000 at 0x500 prevents QEMU from rebooting properly which in turn prevents GRUB from restarting. The exact nature of the failure I'm not sure of, but the results are pretty apparent in this case.

    If booting with the original Multiboot specification the same thing applies, although the Multiboot header only has to be aligned on a Double Word boundary (32-bit) and be within the first 8192 bytes of the ELF executable.