linux-kernelqemupcihypervisormmu

ARMv8A hypervisor - PCI MMU fault


I am trying to implement a minimal hypervisor on ARMv8A (Cortext A53 on QEMU Version 6.2.0).I have written a minimal hypervisor code in EL2 and the Linux boots successfully in EL1. Now I want to enable stage-2 MMU. I have written basic page tables in stage2 (Only the necessary page table entries to map to 1GB RAM). If I disable PCI in DTB the kernel boots successfully.The QEMU command line is given below.

qemu-system-aarch64 -machine virt,gic-version=2,virtualization=on -cpu cortex-a53 -nographic -smp 1 -m 4096 -kernel hypvisor/bin/hypervisor.elf -device loader,file=linux-5.10.155/arch/arm64/boot/Image,addr=0x80200000 -device loader,file=1gb_1core.dtb,addr=0x88000000

When the PCI is enabled in DTB, I am getting a kernel panic as shown below.

[    0.646801] pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
[    0.647909] Unable to handle kernel paging request at virtual address 0000000093810004
[    0.648109] Mem abort info:
[    0.648183]   ESR = 0x96000004
[    0.648282]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.648403]   SET = 0, FnV = 0
[    0.648484]   EA = 0, S1PTW = 0
[    0.648568] Data abort info:
[    0.648647]   ISV = 0, ISS = 0x00000004
[    0.648743]   CM = 0, WnR = 0
[    0.648885] [0000000093810004] user address but active_mm is swapper
[    0.653399] Call trace:
[    0.653598]  pci_generic_config_read+0x38/0xe0
[    0.653729]  pci_bus_read_config_dword+0x80/0xe0
[    0.653845]  pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
[    0.653974]  pci_bus_read_dev_vendor_id+0x4c/0x70
[    0.654090]  pci_scan_single_device+0x80/0x100

I set a GDB breakpoint in 'pci_generic_config_read' and observed that the faulting instruction is

>0xffff80001055d5c8 <pci_generic_config_read+56> ldr     w1, [x0]

The value of register X0 is given below

(gdb) p /x $x0
$4 = 0xffff800020000000

The hardware (host) is configured to have 4GB in total and the Linux (guest) is supplied 1GB through command line and DTB. This is a single core system with 'kaslr' disabled.

Excerpt from the DTB containing PCI part is given below.

    pcie@10000000 {
        interrupt-map-mask = <0x1800 0x00 0x00 0x07>;
        interrupt-map = <0x00 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x03 0x04 0x00 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x04 0x04 0x00 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x05 0x04 0x00 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x06 0x04 0x800 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x04 0x04 0x800 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x05 0x04 0x800 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x06 0x04 0x800 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x03 0x04 0x1000 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x05 0x04 0x1000 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x06 0x04 0x1000 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x03 0x04 0x1000 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x04 0x04 0x1800 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x06 0x04 0x1800 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x03 0x04 0x1800 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x04 0x04 0x1800 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x05 0x04>;
        #interrupt-cells = <0x01>;
        ranges = <0x1000000 0x00 0x00 0x00 0x3eff0000 0x00 0x10000 0x2000000 0x00 0x10000000 0x00 0x10000000 0x00 0x2eff0000 0x3000000 0x80 0x00 0x80 0x00 0x80 0x00>;
        reg = <0x40 0x10000000 0x00 0x10000000>;
        msi-parent = <0x8002>;
        dma-coherent;
        bus-range = <0x00 0xff>;
        linux,pci-domain = <0x00>;
        #size-cells = <0x02>;
        #address-cells = <0x03>;
        device_type = "pci";
        compatible = "pci-host-ecam-generic";
    };

If my interpretation of DTB is right, the PCI device is mapped to the address range '0x40_1000_0000' (offset) '0x1000_0000' (size 256MB). that is, it starts from 100GB in the physical address space.

I have written a page table entry mapping to this physical address as well (as a device memory).

Is it right for the PCI to map to such a higher address in the physical address space? Any hints on debugging this issue is greatly appreciated.


Solution

  • Yes, for a 64-bit CPU this is the expected place to find the PCI controller ECAM region. The virt board puts some "large" device memory regions beyond the 4GB mark (specifically, PCIE ECAM, a seconD PCIE MMIO window, and redistributors for CPUs above 123). (You can turn this off with -machine highmem=off if you like, though that will limit the amount of RAM you can give the VM to 3GB.)

    Depending on what your hypervisor is doing, you might or might not want it to be talking directly to the host PCI controller anyway.