assemblyarmarm64page-tablesarmv8

Does AArch64 need a DSB after creating a page table entry?


On aarch64 armv8 platform, single core, after we create a new page table entry, then immediately access the address:

str x1, [x0] ;x1 is phy addr for pte, x0 is pte_entry
ldr x2, [x3] ;x3 has VA that is mapped by above instruction

My question: Does CPU+MMU correctly handle this?
Or do we need to insert dsb between these two instructions, to make sure of the mem access order?

str x1, [x0] ;x1 is phy addr for pte, x0 is pte_entry
dsb sy
ldr x2, [x3] ;x3 has VA that is mapped by above instruction

Solution

  • ARMv8 requires what they call a "break-before-make" procedure when updating page table entries. This procedure is described in G5.9.1 in the ARMv8 ARM:

    A break-before-make sequence on changing from an old translation table entry to a new translation table entry requires the following steps:

    1. Replace the old translation table entry with an invalid entry, and execute a DSB instruction.
    2. Invalidate the translation table entry with a broadcast TLB invalidation instruction, and execute a DSB instruction to ensure the completion of that invalidation.
    3. Write the new translation table entry, and execute a DSB instruction to ensure that the new entry is visible. This sequence ensures that at no time are both the old and new entries simultaneously visible to different threads of execution, and therefore the problems described at the start of this subsection cannot arise.

    Now, since we're converting an invalid entry into a valid entry, we can cheat a little and take advantage of that fact to skip step 1 and 2 (it is already invalid and we know invalid entries are never cached). This leaves us with just step 3 which requires that we issue a DSB to make the new entry visible.

    This is a lot of words to say that, yes, you do need a DSB here even on a uniprocessor system. The page table walk unit is not a coherent, ordered agent and thus it may observe stores in orders that are not allowed under the ARMv8 ISA. The correct sequence is this:

    str x0, [x1] # Write your table entry (invalid->anything else)
    dsb st       # Ensure table entry is visible to MMU
    isb sy       # Ensure the previous dsb has completed
    

    One critical thing to note is that we do include an isb as well after the dsb. This is because ARMv8 allows dsb instructions to be re-ordered and so even in your second example which includes the dsb, you may see incorrect behavior since the ldr x2 instruction can be re-order above the dsb (and likely will, since dsb sy instructions are incredibly expensive and can stall for tens of thousands of cycles). In my example, I also use a dsb st instead of a dsb sy. This is a correct and lighter choice since we don't generally require other system agents to come in sync with us for racing an access to an invalid page being updated.

    A motivating example for why this may be necessary can be found in a weird, (somewhat contrived) case where we are executing a page table modification on the last few bytes of a page and we're mapping in new code for the page directly ahead of us. The CPU will fetch the instructions off our current page, launch them, and then attempt to continue. It'll see that we're crossing a page and launch a page table walk which, since the store is not necessarily complete yet (and the MMU is not a coherent agent and so does not stall for ordering), may kick back an exception since we haven't inserted the translation yet. Now, if we were to place the barriers, the DSB+ISB pair would cause the CPU to not fetch any further instructions until the pair retires, which forces the pipeline to fetch only after the store completes and thus to not observe the incorrect invalid entry.