assembly linux-kernel arm64 spinlock load-link-store-conditional

How is a spin lock woken up in Linux/ARM64?

In the Linux kernel, arch_spin_lock() is implemented as follows:

static inline void arch_spin_lock(arch_spinlock_t *lock)
{
    unsigned int tmp;
    arch_spinlock_t lockval, newval;

    asm volatile(
    /* Atomically increment the next ticket. */
"   prfm    pstl1strm, %3\n"
"1: ldaxr   %w0, %3\n"
"   add %w1, %w0, %w5\n"
"   stxr    %w2, %w1, %3\n"
"   cbnz    %w2, 1b\n"
    /* Did we get the lock? */
"   eor %w1, %w0, %w0, ror #16\n"
"   cbz %w1, 3f\n"
    /*
     * No: spin on the owner. Send a local event to avoid missing an
     * unlock before the exclusive load.
     */
"   sevl\n"
"2: wfe\n"
"   ldaxrh  %w2, %4\n"
"   eor %w1, %w2, %w0, lsr #16\n"
"   cbnz    %w1, 2b\n"
    /* We got the lock. Critical section starts here. */
"3:"
    : "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
    : "Q" (lock->owner), "I" (1 << TICKET_SHIFT)
    : "memory");
}

Notice the 'wfe' instruction puts the processor in low power mode and waits for the event register to be set. ARMv8 manual specifies that an event is generated if the global monitor for the PE is cleared (section D1.17.1). This should be done by the unlock part. But lets look at the arch_spin_unlock() part:

static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
    asm volatile(
"   stlrh   %w1, %0\n"
    : "=Q" (lock->owner)
    : "r" (lock->owner + 1)
    : "memory");
}

There is no SEV!! So, what is waking up the lock WFE here?

PS: I've been looking for any ARM64 assembly tutorials but nothing came up. Would be awesome if someone has any suggestions. Thanks!

Solution

When locking, the line

" ldaxrh %w2, %4\n"

after the wfe performs an exclusive load-acquire of the lock. As stated in the previous comment, this will mark the address of the lock with the global monitor.

The unlock code performs a store-release on the same address

" stlrh %w1, %0\n"

This will generate the event. That is the reason why they use a load-acquire for the lock in the locking function, as opposed to regular load, and why you don't need a SEV when unlocking.