cachingconcurrencyx86-64atomicload-link-store-conditional

Can you snoop cache coherence traffic to implement linked-load and store-conditional?


I kind of want to implement a form of LL/SC for x86-64 (Saphire/Emerald Rapids most likely). It seems that the cache has all the info needed to do this if, but I need to know when a cache line is invalidated/modified (I can guarantee alignment and size of the allocations).

Is there any way to snoop the QPI traffic? I had heard there was but you took a performance hit (bad). I really just want to tell the system to tell me when this address range gets modified, then I can abort the store. Kind of like a less precise version of CAS but arbitrary size operands).


Solution

  • Snoop with software? Not that I know of, other than umonitor/umwait to sleep until a value changes (or a timeout).

    Even if there was a way to check and branch if this core still had exclusive ownership of a cache line, check then store would have a TOCTOU race window, where an invalidate from another core could arrive between the check and the unconditional store.


    TSX transactional memory, and RTM specifically (Restricted Transactional Memory), is a generalized form of LL/SC, with the CPU aborting the transaction if another core interferes. You start a transaction with an xbegin instruction, do some loads and stores, then run xend. So the check is after you've given the CPU all the loads/stores. The whole transaction either commits or aborts.

    David Kanter wrote some good articles about how CPUs can implement this, and specifically how Haswell probably did:


    If supported and enabled on your system, RTM can give you a generalized CAS, or even larger transactions involving multiple cache lines.

    But it's not enabled everywhere due to hardware security vulnerabilities.

    Sapphire Rapids introduced a new TSX feature (TSXLDTRK to suspend / resume tracking of loads within a transaction, so you can read some memory without it being part of the read-set of the transaction). This might mean that Sapphire Rapids has working RTM, unless a decision to disable TSX entirely was taken after that new feature was added to the manuals (in 2020, before the 2021 microcode update which disabled TSX on Skylake through Coffee Lake.)

    On CPUs with TSX disabled, RTM transactions (xbegin) just always abort (Intel TSX: xbegin always returns 0). Or can be an illegal instruction (Assembler xbegin raise Illegal instruction)


    Given the history of TSX being supported and then disabled in every Intel CPU since Haswell in 2013 (except Ice Lake where it wasn't supported at all), be prepared for support to disappear due to some future microcode update for your existing hardware, even if you can get something working now.