On RISC-V, A page table entry, which is the size of usize in Rust, has an RSW field, which is 2 bits wide. This field is supervisor defined (e.g. the kernel can use it as it pleases). Could one of these bits be used as a lock for the rest of the page table entry?
I've written an example below. Any comments on improvement of the example code would also be appreciated.
use core::sync::atomic::{AtomicUsize, Ordering};
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq)]
pub struct PageTableEntry(usize);
impl PageTableEntry {
// The first RSW bit is bit 8
const LOCK_MASK: usize = 1 << 8;
unsafe fn lock(page_table_entry: *mut Self) {
unsafe {
let atomic_entry = AtomicUsize::from_ptr(page_table_entry as *mut usize);
loop {
let value = atomic_entry.load(Ordering::Relaxed);
if value & Self::LOCK_MASK != 0 {
core::hint::spin_loop();
continue;
}
if atomic_entry.compare_exchange_weak(value, value | Self::LOCK_MASK, Ordering::Acquire, Ordering::Relaxed).is_ok() {
break;
}
}
}
}
unsafe fn unlock(page_table_entry: *mut Self) {
unsafe {
let atomic_entry = AtomicUsize::from_ptr(page_table_entry as *mut usize);
atomic_entry.fetch_and(!Self::LOCK_MASK, Ordering::Release);
}
}
}
The Rust core library says in regard to atomics:
The most important aspect of this model is that data races are undefined behavior. A data race is defined as conflicting non-synchronized accesses where at least one of the accesses is non-atomic. Here, accesses are conflicting if they affect overlapping regions of memory and at least one of them is a write. (A compare_exchange or compare_exchange_weak that does not succeed is not considered a write.) They are non-synchronized if neither of them happens-before the other, according to the happens-before order of the memory model.
I think this would mean my code has undefined behaviour if run in multiple threads, as a thread that obtains the lock would perform non-atomic operations (including writes) to the page table entry while atomic reads of the lock occur. But, the non-atomic operations will never write to the lock bit, and the atomic operations will only read the lock bit while non-atomic operations are occurring, so while this could lead to an atomic operation reading an invalid page table entry, it only cares about the lock bit, which will be valid.
There are two things that marking an access as atomic does:
It causes the compiled program to do the access in such a way that the hardware will provide the concurrency guarantees you asked for: if the rules of the atomic orderings you chose require values to be visible to a particular thread, then the compiler will generate hardware instructions that provide that visibility, and atomic read-modify-write instructions will likewise compile into hardware instructions that prevent any other thread interruping the read-modify-write sequence.
It tells the compiler that it is possible that the accessed value will change in a way that would normally be a race condition. The Rust compiler is generally allowed to optimize in a way that assumes that no race conditions occur, and can use this as a basis for proving an optimisation to be sound. So if you do a non-atomic access to memory that another thread is also allowed to access (except in the case where all the accesses are reads), you are violating an assumption that the compiler uses to make optimisations: and because the optimizer is reasoning based on incorrect information, this might cause a miscompile.
In the case of your program, you are locking a piece of memory by atomically changing the value of that memory, and then doing non-atomic writes to the locked memory. The problem is, what happens if another thread turns up while it's doing that, and tries to lock the same memory? In order to even check whether or not the memory is locked, it needs to read it – but it might do that at the same time as the thread that actually holds the lock is non-atomically writing it, which would be a race condition (a read racing with a write) even if they are looking at different bits within the same memory.
Looking at the two "reasons to use atomic" above, 1. might not apply on most processors (because most processors, IIRC including RISC-V, are actually able to handle simultaneous non-atomic reads and writes to the same address without undefined behaviour, as long as the read is racing with a pure write rather than a read-modify-write and isn't too large: the read sees either the value before or the value after the write). However, 2. is a real problem: if you don't use atomic accesses to the value even while it is locked, you have no guarantee that the compiler will generate code that actually stores the values in memory that you're expecting. (At least when using raw pointers to access the memory, the generated code won't touch bytes of memory that the program doesn't specify accessing – but it might still touch other bits within a single byte that you access, because Rust doesn't support accesses finer-grained than the byte level – and doing things like "OR with 0" still conceptually accesses the bytes that you're ORing with 0 and the optimizer is allowed to use the fact that you've accessed them to assume no race conditions.)
Or to explain the problem with your argument, from the compiler's point of view: you say "the non-atomic operations will never write to the lock bit", but they actually appear to the compiler as though they write to the entire usize, and that includes the lock bit; the fact that you're always overwriting the lock bit with the value it already had doesn't matter to the compiler, which just sees that it's allowed to write to that memory, and thus (e.g.) might assume that it could temporarily change that bit to a different value and back again, if doing so is more efficient for some reason.
You can fix this problem by using relaxed atomics for accesses to the values while they're locked: at the hardware level this will take care to modify the values in such a way that other threads can't see spurious values mid-modify, and at the compiler level this will take care to avoid doing anything weird with the values in memory that might disturb other threads. A relaxed atomic is usually pretty efficient in hardware (especially if you are just doing reads and writes rather than trying to make a read-modify-write sequence atomic), so the only real performance impact of this is to suppress optimizations that might be incorrect in a multithreaded context (but that's exactly what you want to do).
The easiest way to do this is to just declare the values you're operating on as atomic throughout the entire program (rather than as usize) – that way, you wouldn't need to unsafely transmute them back and forth.
Meanwhile, one other suggestion: AtomicUsize usually isn't the best option for manipulating tagged-pointer values atomically in Rust: any value read out of an AtomicUsize (or generated by a calculation on one, except for calculations that offset/index into an existing pointer/reference) can't be used to access memory without undefined behaviour (because the optimizer is looking for pointer/reference values to determine which accesses might alias which other accesses but a usize is not a pointer). For page table entries, treating them as integers is fine if they're just being used to configure the page tables (because if the Rust program isn't reading via those addresses itself, the compiler doesn't know or care whether the integers you write are actual addresses, or just indexes into an array containing all of memory) – but if you are trying to do a pagewalk or the like in your Rust code (i.e. actually dereferencing addresses from them, perhaps after flipping bits), the fact that the address has been stored in a usize will make your pagewalk undefined behaviour. It is usually better to use AtomicPtr instead in any case where you want to be able to atomically access a pointer-sized value that might be used to access memory, even when some of the bits of that value are being used for other things: it is always safe to store an integer in an AtomicPtr (or pack integers together with pointers there), but pointers lose their ability to access memory whne stored in an AtomicUsize. This is a consequence of the compiler making use of provenance to determine which pointers/references could potentially act on whcih memory addresses (and of the fact that in Rust, integers are not able to store provenance). (AtomicPtr was designed with the tagged-pointer use case in mind, and has lots of methods for doing things like atomically flipping bits that are useful in that context.)