The RIDL exploit requires that the attacker trigger a page fault to be able to read stale data from the Line Fill Buffer. But according to About the RIDL vulnerabilities and the "replaying" of loads, an assisted load can also be used.
That question mentions assisted/assisting loads nine times, but I still couldn't wrap my head around what such a load does or how it is triggered. It's something related to TLBs and "cause a page walk that requires an microcode assist".
Can someone explain what an assisted/assisting load is, preferably with a worked out example?
You left out the rest of the sentence in that quote which explains why a page walk might need a microcode assist: "... causes a page walk that requires an microcode assist (to set the accessed bit in the page table entry).
The x86 ISA says that reading or writing a page will set the "accessed" bit in the page table entry (PTE) for that mapping, if the bit wasn't already set. OSes can use this to see which pages are actually getting accessed regularly (by clearing the accessed bit and letting HW set it again) so they can decide which pages to page out if they need to free up some physical pages. Same story for the "dirty" bit which lets an OS know if the page needs to be synced back to a file or other backing store, if any. (e.g. how an OS can implement a mmap(MAP_SHARED,PROT_WRITE)
)
Page walks to fill TLB entries are pure dedicated hardware, but updating those PTE bits with stores are rare enough that it can be left to microcode; the CPU basically traps to internal microcode and runs it before resuming.
A similar mechanism is used in some CPUs to handle subnormal aka denormal floating-point results that the hard-wired FPU doesn't handle. This lets the common case (normalized floats) be lower latency.
Related:
Perf counter on Intel (on Skylake at least): perf stat -e other_assists.any
[Number of times a microcode assist is invoked by HW other than FP-assist. Examples include AD (page Access Dirty) and AVX* related assists]
Triggering assists loads from user-space: I'm not sure what approach is good.
msync(MS_SYNC)
on a file-backed mapping should clear the Dirty bit. IDK if it would clear the Accessed bit. Presumably a fresh file-backed mmap
with MAP_POPULATE would have its Accessed bit clear, but be wired into the page table so it wouldn't take a #PF
page fault exception. Maybe also works with MAP_ANONYMOUS
.
If you had multiple pages with their Accessed bits clear, you could loop over them to be able to do multiple assisted loads without making an expensive system call in between.
On Linux kernel 4.12 and later I suspect madvise(MADV_FREE)
on private anonymous pages clears the Dirty bit, based on the way the man page describes it. It might also clear the Accessed bit, so a load might also need an assist, IDK.
MADV_FREE
(since Linux 4.5)
The application no longer requires the pages in the range specified by addr and len. The kernel can thus free these pages, but the freeing could be delayed until memory pressure occurs. For each of the pages that has been marked to be freed but has not yet been freed, the free operation will be canceled if the caller writes into the page. After a successful MADV_FREE operation, any stale data (i.e., dirty, unwritten pages) will be lost when the kernel frees the pages. However, subsequent writes to pages in the range will succeed and then kernel cannot free those dirtied pages, so that the caller can always see just written data. If there is no subsequent write, the kernel can free the pages at any time. Once pages in the range have been freed, the caller will see zero-fill-on-demand pages upon subsequent page references.The MADV_FREE operation can be applied only to private anonymous pages (see mmap(2)). In Linux before version 4.12, when freeing pages on a swapless system, the pages in the given range are freed instantly, regardless of memory pressure.
Or maybe mprotect
, or maybe mmap(MAP_FIXED|MAP_POPULATE)
a new anonymous page to replace the current page. With MAP_POPULATE it should already be wired into the HW page tables (not needing a soft page-fault on first access). The dirty bit should be clear, and maybe also the Accessed bit.
A vpmaskmovd
store with mask=0 (no actual store) will trigger an assist on a write-protected page, e.g. a lazily-allocated mmap(PROT_READ|PROT_WRITE)
page that's only been read, not written. So it's still CoW mapped to a shared physical page of zeros.
It leaves the page clean, so this can happen every time in a loop over an array if every store has mask=0 to not replace any elements.
This is a little different from the Accessed / Dirty page-table assists you want. This assist is I think for fault suppression, because it needs to not take a #PF
page fault. (The page is actually write protected, not just clean.)
IDK if that's useful for MDS / RIDL purposes.
I haven't tested with masked loads from a freshly-allocated mmap(MAP_POPULATE)
buffer to see if they take an assist but leave the Accessed bit unset.