I am running a simulated RV64GC core in QEMU and am trying to better understand the virtual memory subsystem and address translation process in RISC-V. My simulated system runs with OpenSBI, the Linux Kernal v5.5, and a minimal rootfs.
In QEMU debug traces, I see that sometimes (most commonly with ecalls) control is passed to the SBI and the addresses change from kernel (virtual?) addresses with an offset of 0xffffffe000000000 into something that looks like real, physical, addresses in RAM. For example,
...
0xffffffe00003a192: 00000073 ecall
...
IN: sbi_ecall_0_1_handler
0x0000000080004844: 00093603 ld a2,0(s2)
0x0000000080004848: 4785 addi a5,zero,1
0x000000008000484a: 00a797b3 sll a5,a5,a0
...
In the RISC-V privileged specification version 1.11, section 4.1.12, the satp CSR (control and state register) is defined to have a MODE field that determines address translation designation. A MODE of 0 means that translation is bare (addresses are considered physical), a MODE of 8 or 9 requires Sv39 or Sv48 page-based virtual addressing, respectively, and any other MODE values are reserved.
Now, both the RISC-V privileged and unprivileged specifications don't seem to mention when satp may be changed (other than with csrrw), so this leads me to the following questions:
When control is handed to the SBI (as with the ecall above), does the satp MODE change to 0? If yes, does this mean the satp mode should be reset on a u/s/mret instruction? Are there other instances (other than csrrw) where satp is supposed to change?
If not, is there some other mechanism by which the addresses are interpreted and designated as physical? Or are the addresses (the 0x80XXXXXX addresses above) instead considered virtual and should go through the usual virtual address translation process (as outlined in section 4.3.2 of the RISC-V privileged specification)? If this is the case, when are page table entries created for this?
The memory model of RISC-V works in the following way:
M mode has its own memory protection system described under section 3.6 of privileged specifications called PMP (physical memory protection). This is to impose memory protection on lower privilege levels and also M mode itself (if lock bit is used). There is no virtual memory system in M mode.
Now in the S mode, it has page based virtual memory system that S mode can use to set virtual to physical address mapping and also to impose memory restrictions on S mode itself and also U mode.
So each privilege level has control on its own resources and the resources below it but never on the resources of the privilege level above it. This is how things work.
M mode can control memory accessible by M, S and U modes, and S mode can control memory view (virtual memory) and accessibility of S and U modes but not M mode. So satp mode never even changes when moving to M mode. As the mapping pointed by it is never even applicable to M mode (except the mstatus.MPRV bit is set). It has its on memory protection unit.
This would be huge security hole if lower privilege levels could impose memory restrictions on higher privilege levels.