In the Linux kernel, arch_spin_lock() is implemented as follows:
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
unsigned int tmp;
arch_spinlock_t lockval, newval;
asm volatile(
/* Atomically increment the next ticket. */
" prfm pstl1strm, %3\n"
"1: ldaxr %w0, %3\n"
" add %w1, %w0, %w5\n"
" stxr %w2, %w1, %3\n"
" cbnz %w2, 1b\n"
/* Did we get the lock? */
" eor %w1, %w0, %w0, ror #16\n"
" cbz %w1, 3f\n"
/*
* No: spin on the owner. Send a local event to avoid missing an
* unlock before the exclusive load.
*/
" sevl\n"
"2: wfe\n"
" ldaxrh %w2, %4\n"
" eor %w1, %w2, %w0, lsr #16\n"
" cbnz %w1, 2b\n"
/* We got the lock. Critical section starts here. */
"3:"
: "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
: "Q" (lock->owner), "I" (1 << TICKET_SHIFT)
: "memory");
}
Notice the 'wfe' instruction puts the processor in low power mode and waits for the event register to be set. ARMv8 manual specifies that an event is generated if the global monitor for the PE is cleared (section D1.17.1). This should be done by the unlock part. But lets look at the arch_spin_unlock() part:
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
asm volatile(
" stlrh %w1, %0\n"
: "=Q" (lock->owner)
: "r" (lock->owner + 1)
: "memory");
}
There is no SEV!! So, what is waking up the lock WFE here?
PS: I've been looking for any ARM64 assembly tutorials but nothing came up. Would be awesome if someone has any suggestions. Thanks!
When locking, the line
" ldaxrh %w2, %4\n"
after the wfe
performs an exclusive load-acquire of the lock. As stated in the previous comment, this will mark the address of the lock with the global monitor.
The unlock code performs a store-release on the same address
" stlrh %w1, %0\n"
This will generate the event. That is the reason why they use a load-acquire for the lock in the locking function, as opposed to regular load, and why you don't need a SEV when unlocking.