I'm following a book to write a linux like kernel, however, met problems with the APIC chapter.
Before everything, I'll list my platform. I'm on Windows 10, using Virtual Box to run Ubuntu 18.04, and run test codes on bochs within it.
Currently my understanding about APIC are as follow:
1, There are built on Local APIC on each core and I/O APIC on motherboard
2, Local APIC can be accessed using memory mapping or MSR referencing
3, I/O APIC are accessed by 3 registers IOREGSEL, IOWIN, EOI. The basic idea is to set the value for IOREGSEL and access the corresponding register with IOWIN.
4, There are 3 mode, the interested one is Symmetric I/O mode
5, I/O APIC have 24 pins, pin 1 is linked to keyboard
6, To enable APIC and I/O APIC, there are serials of works to do:
a) Mask 8529A interrupt
b) Enable xAPIC and 2xAPIC, so that MSR access are possible
c) Mask all LVT (if Local interrupt are not needed)
d) Setting RTE entries for I/O APIC
e) Setting IMCR register to 0x01h, force 8529A interrupt pass signal to I/O APIC
f) Find Other Interrupt Control Register(OIC) through Root Complex Base Address Register(RCBA), and set OIC[8]=1b to enable I/O APIC
Now I'll present my questions:
1, On both bochs and Virtual Box, the Max LVT Entry number is detected as 6 (according to Manual, there are 6+1=7 LVT entries), and the LVT_CMCI entry can not be accessed(gp fault).
2, It is said different chips on motherboard will map RCBA to different port, and I would have to look it up through manuals. But would there be a way to detect it by software itself, otherwise how did the commercial OS fit different platform.
3, Since I'm on virtual machine, how could I detect the accessibility of RCBA
Thanks to anyone who can provide a clue to my questions or helping me understand more about this chapter.
I'll present some of my code on setting up APIC for a simple keyboard interrupt.
First would be interrupt handling function
void IRQ0x21_interrupt(Int_Info_No_Err STK)
{
Ent_Int;
color_printk(RED,BLACK,"do_IRQ: 0x21\t");
unsigned char x;
x = io_in8(0x60);
color_printk(RED,BLACK,"key code:%#08x\n",x);
wrmsr(0x80b, 0UL);
//io_out8(0x20,0x20);
Ret_Int;
}
Ret_Int & Ent_Int are macros defined to handle the interrupt stack, wrmsr() function write 0 to MSR address 0x80b(EOI)
Next would be the setup function for LAPIC and I/O APIC, assuming that physical address 0xFEC00000 is already mapped in page table
void APIC_init(void)
{
int i;
int virtual_index_address;
int virtual_data_address;
int virtual_EOI_address;
unsigned long tmp;
//Set interrupt, note No.33 link to IRQ0x21_interrupt() function
for(i = 32;i < 56;i++)
{
_Set_INT(IDT_PTR.Offset + i, ATTR_INTR_GATE, 2, interrupt[i - 32]);
}
//Mask 8529A
io_out8(0x21,0xff);
io_out8(0xa1,0xff);
//enable IMCR
io_out8(0x22,0x70);
io_out8(0x23,0x01);
#pragma region Init_LAPIC
//Enabling xAPIC(IA32_APIC_BASE[10]) and 2xAPIC(IA32_APIC_BASE[11])
tmp = rdmsr(0x1b);
tmp |= ((1UL << 10) | (1UL << 11));
wrmsr(0x1b,tmp);
//Enabling LAPIC(SVR[8])
tmp = rdmsr(0x80f);
tmp |= (1UL << 8); //No support for EOI broadcast, no need to set bit SVR[12]
wrmsr(0x80f,tmp);
//Mask all LVT
tmp = 0x10000;
//wrmsr(0x82F, tmp); Virtual machine do not support
wrmsr(0x832, tmp);
wrmsr(0x833, tmp);
wrmsr(0x834, tmp);
wrmsr(0x835, tmp);
wrmsr(0x836, tmp);
wrmsr(0x837, tmp);
#pragma endregion
#pragma region Init_IOAPIC
virtual_index_address = (unsigned char*)(0xFEC00000 + PAGE_OFFSET);
virtual_data_address = (unsigned int*)(0xFEC00000 + PAGE_OFFSET + 0x10);
virtual_EOI_address = (unsigned int*)(0xFEC00000 + PAGE_OFFSET + 0x40);
//Setting RTEs, mask all but 0x01 RTE table for keyboard
for(i = 0x10;i < 0x40;i += 2){
*virtual_index_address = i;
io_mfence;
*IOAPIC_MAP.virtual_data_address = 0x10020 + ((i - 0x10) >> 1) & 0xffffffff;
io_mfence;
*IOAPIC_MAP.virtual_index_address = i + 1;
io_mfence;
*IOAPIC_MAP.virtual_data_address = ((0x10020 + ((i - 0x10) >> 1)) >> 32) & 0xffffffff;
io_mfence;
}
*virtual_index_address = 0x12;
io_mfence;
*IOAPIC_MAP.virtual_data_address = 0x10020 + (2 >> 1) & 0xffffffff;
io_mfence;
*IOAPIC_MAP.virtual_index_address = i + 1;
io_mfence;
*IOAPIC_MAP.virtual_data_address = ((0x10020 + (2 >> 1)) >> 32) & 0xffffffff;
io_mfence;
#pragma endregion
}
So according to the answers, the I/O APIC is set to open once I complete initialization for RTEs. If any one can be so kind to tell me if the above code would work or not(for a simple keyboard interrupt). Thank you so much.
1, On both bochs and Virtual Box, the Max LVT Entry number is detected as 6 (according to Manual, there are 6+1=7 LVT entries), and the LVT_CMCI entry can not be accessed(gp fault).
Intel documents seven LVT entries in its Software Developer Manual (section 10.5.1) but that's the current state of the hardware.
The LVT performance counter register and its associated interrupt were introduced in the P6 processors and are also present in the Pentium 4 and Intel Xeon processors.
The LVT thermal monitor register and its associated interrupt were introduced in the Pentium 4 and Intel Xeon processors.
The LVT CMCI register and its associated interrupt were introduced in the Intel Xeon 5500 processors.
If you consider P6 and Pentium 4 processors obsolete, you can always assume there are at least six LVT entries.
The Xeon 5000 series is based on Nehalem, which is the ancestor of the modern generations of CPUs, and it dates back 2008.
Accessing an invalid LAPIC register in x2APIC mode (i.e. MSR access) generates a #GP since accessing non-existent MSRs does that.
Using the legacy interface and staying inside the LAPIC reclaimed region (up to offset 0x3f0) will set bit7 in LAPIC ESR register.
Boch doesn't handle the LVT_CMCI
register, there is literally no support for it in the source code.
That repo is possibly out of sync with the current source but my build of bochs (fairly recent) still don't support it.
The switch against the register offset was present back in 2007, before the Xeon 5500, so either the author forgot to update it or decided it wasn't worth supporting MCE.
I haven't checked VirtualBox but considering that MCE and the more general MCA machinery is quite complex, probably there is no support for it.
Simply put, LVT_CMCI
is optional. You can check its presence using the normal MMIO interface and the ESR register.
2, It is said different chips on motherboard will map RCBA to different port, and I would have to look it up through manuals. But would there be a way to detect it by software itself, otherwise how did the commercial OS fit different platform.
IOAPIC is reported to the OS through the ACPI tables, specifically section 5.2.12 Multiple APIC Description Table (MADT) of the ACPI specification contains the MMIOs of the IO APICs.
Alternatively, if present, the Intel MP Table can be used.
The software doesn't need to know about the hardware to get to the IO APICs. In fact the RCBA things is quite inconsistent at the hardware level.
In current x86 system there is always an IO APIC in the PCH (Platform Controller Hub) and there also is an IO APIC in the uncore of some multi sockets server CPUs (E5 and E7 series, along with Xeon 5500 have it - Xeon Scalable could/should but there is no detailed datasheet).
Finally, an IO APIC could be provided by other means, like in a PCI hub (e.g. the Intel PXH).
The IO APIC in the PCH Series 7, used at the time with Ivy Bridge processors (around 2012) follows the RCBA pattern:
The OIC is located at offset 0x31FE in the RCBA and the RCBA is at offset 0xF0 in the PCI config space of the PCI-to-LPC bridge (device 1f.0).
There is no particular link between the RCBA and the LPC interface, evidently Intel used this device for internal reasons.
Since this is all documented, the OS can get the RCBA and the OIC address; granted it recognises the chipset.
The same is true for the series 8 (Haswell).
Starting from the series 100 of the PCH (coupled with Skylake) the IO APIC in the PCH is controlled by the P2SB (Primary to Sideband) controller, this is the device 1f.1
(valid up to C620 series, the last at the time of writing).
The P2SB can be hidden from the software by writing bit8 of the 0xE0 register in the PCI config space, this makes all PCI config reads return ones.
Writes, at least to 0xE0, are still accepted; infact I've "de-hidden" the P2SB in my system and checked its configuration.
Register 0x64 in its PCI config space works like the OIC register (though its called IOAC).
Server side, some (most?) Intel's processors have an IO APIC integrated in the uncore.
This appears as a PCI device (unlike the client side APIC, also there's a PCI class for IOAPIC).
It can use the standard PCI BAR mechanism (the register is named MBAR
), thus it could be mapped anywhere in the 4GiB and not only at 0xFECx xxxx
.
It also have an ABAR
register the work similar to the IOAC register.
This pattern seems to be true for all IO APIC appearing as PCI device (e.g. those in the PXH hubs).
In a server, the PCH also have an IO APIC, however, more configuration is needed to let the system route the requests correctly to the IO APIC behind DMI.
All these details are revealed for BIOS programmers more than OS programmers, the reliable way is to use the ACPI table or the MP table (if both don't exist the system is not SMP and no IO APIC is needed).
3, Since I'm on virtual machine, how could I detect the accessibility of RCBA
This was partly or totally addressed in the answer to point 2 (i.e. either there is no RCBA or it is in the PCI-to-LPC config space at 0xf0).
If you are using VirtualBox, you can either select a PIIX3 or ICH9 chipset.
For PIIX3 there is no RCBA (too old) and the APIC base has the form FEC0_xy00h
where xy
can be configured at address 0x80 of the config space of device 00.0
.
I've only skimmed the datasheet but I appears that the IO APIC is an external component and that setting determines when to assert IO APIC specific pins.
For the ICH9 the RCBA is in the PCI-to-LPC bridge. So a simple way to read it under Linux is sudo setpci -s 1f.0 F0.D
(but check the syntax).
Note that both components are from the pre-PCH era.