assemblyarmatomicarm64load-link-store-conditional

Will reading by other cores clear the exclusive status(ldrex) on arm smp?


I am writing an assembly program in the SMP kernel, which may run on armv7-a or AArch64 architecture.

This program is run with irq_disabled, so if I ldrex a memory address, the exclusive status will not be cleared due to the arrival of an interrupt (on this core).

Undoubtedly, writing to this address by other CPU cores will cause the exclusive status to be cleared, but what confuses me is whether other cores only read this address (like ldr/ldrex) will cause the exclusive status to be cleared?

The specific scenario is as follows:

Assuming compiler optimization is turned off and assembly code is generated as expected without memory reordering:

int x = 0;

core0:
    local_irq_disable();
    int tmp = __ldrex(&x);
    if (tmp)
        return;
    if (__strex(&x, 1)) { // write failed
        assert(READ_ONCE(x));  // Will this assert always succeed?
    }

core1/core2 or others:
    These cores may be in user mode or kernel mode, and they may read or write the variable x.
    If a write operation is performed on variable x, a non-zero value must be written atomically.

In ARMĀ® Architecture Reference Manual ARMv7-A and ARMv7-R edition, A3.4.2, it show: enter image description here

It seams missing:

Load(Tagged_address,!n) in Exclusive Access state

LoadExcl(Tagged_address,!n) in Exclusive Access state

And the notes are confusing:

The architecture does not require a load instruction by another processor, that is not a Load-Exclusive instruction, to have any effect on the global monitor.

A Load-Exclusive can only update the tagged Shareable memory address for the processor issuing the Load-Exclusive instruction.


Solution

  • Pro tip on ARM. Get older docs as well as current docs. ARM ARM all the way back to the original digital version (armv4t/arm7tdmi days) is still very helpful for the ARMv7. Get both the current TRM rev and read the CPUID of your core in question and get the rev that exactly matches that, that is the rev that that vendor bought. There may be differences from the current core available for purchase, but the current core docs may have some bug fixes (in the docs, all docs have bugs). In this case you need the AXI spec. There is a section on exclusive access. Software sees one set of docs, the folks that implement the silicon side see another, the truth is somewhere in the middle. As of this writing the rev K is the current one, but I liked rev H better for some stuff. Most of the exclusive access section is the same going all the way back to the original from 20 years ago. Same as the ARM ARM and ARM TRM sometimes hopping around different revisions can clarify things...

    IMPLEMENTATION DEFINED and UNPREDICTABLE do not necessarily mean that, but then sometimes they do. Assume they do and write code accordingly (if something generic, if for a specific core on a specific chip then...write code accordingly)

    but what confuses me is whether other cores only read this address (like ldr/ldrex) will cause the exclusive status to be cleared?

    LDREX is the whole point here (exclusive access, along with its STREX partner) and is documented in the ARM ARM and AXI spec. ARM ARM: start exclusive, do a LoadExcl and you end up in Exclusive Access. The monitor is always watching to match the last LoadExcl. So it is now watching for the second cores AxID. Core A does an LDREX, gets an interrupt core B does an LDREX, Core A's exclusive access is now lost to the ether, core B is being monitored. Now if core B gets its STREX in before core A, then it gets an EXOKAY (AXI spec, means successful, normal OKAY is not, the transaction was successful as in not a bus fault (OKAY or EXOKAY), but the processor will return the STREX as a fail)(LDR/STR return OKAY).

    LDR, basically IMPLEMENTATION DEFINED as to what happens. experiment, experiment, experiment and remember how one chip behaves is not how all chips behave, esp when you are talking to chip vendor logic not ARMs.

    The architecture does not require a load instruction by another processor, that is not a Load-Exclusive instruction, to have any effect on the global monitor.

    The AXI spec does not specifically state what the monitor does with an LDR does to that address. It (architecture/spec) does not "REQUIRE", ...but.. it CAN (effect). Mucking with LDR is IMPLEMENTATION DEFINED (As is STR as shown in the ARM ARM).

    A Load-Exclusive can only update the tagged Shareable memory address for the processor issuing the Load-Exclusive instruction.

    Same section of the ARM ARM says:

    Tagged_address = Memory_address[31:a]

    The value of a in this assignment is IMPLEMENTATION DEFINED , between a minimum value of 3 and a maximum value of 11.

    For LDREX/STREX to be of any use for a multi-core system, they need to be hitting a common monitor with the same definition of address for that resource. All the cores participating need to be setup so that they all agree the address is 0x12345000 for example. And the transaction and that address space have to be such that it hits the common monitor.

    tagged Shareable memory address. All the words matter. Tagged address means tagged address, as defined in the doc (as IMPLEMENTATION DEFINED, haha) must match across the cores. It must be shareable memory to hit the common monitor that can truly tell if two cores are trying to touch the same thing. Insert the words shareable memory between the words tagged address gives tagged Shareable memory address.

    Software doc talks about global monitor. Silicon doc talks about reality...Subordinate monitor hardware. Possible to have more than one monitor as there are multiple AXI buses. The use of the word global, goes hand in hand with the shareable memory. You need to hit a commonly shared monitor, call it global or shared or pick a name.

    Arguably, really, you can have multiple/many global monitors. Depending on the chip design, the on chip SRAM may be an extension/reflection of the AXI bus (vs a chip vendors own bus design). And the DRAM, which of course, is eventually another set of buses, may use an extension or reflection of the AXI bus as well. This is easily discoverable from bare metal code. How many monitors does not matter here, what matters is that you pick an address that will land on the same monitor for all the cores using it.

    I deleted a whole bunch of text before this answer. IMO you should go read A version of the AXI spec with respect to exclusive access. There is a laundry list of signals that are being monitored and that ideally need to match between the read and the write for the STREX to successfully write. For example LDREXB/STREX is not expected to work. Length matters. AxLOCK is one of those set of signals (when asserted indicates the transaction is exclusive). So is it the case that the implementation of an exclusive access monitor only monitors things with AxLOCK asserted? (watches only LDREX/STREX not LDR/STR) or did this chip vendor primarily look at the address (while it is looking for an STREX to match an LDREX, not cleared (CLREX)) and then wants other signals to match (including AxLOCK)? implementation defined. (also search for EXOKAY for a signal description and OKAY is listed right next to it to see how this is how the logic returns pass/fail for STREX).

    Yes they use the word "should" and also say "might fail even if...". Which may cause you to post another question, but there is no single/global answer. If you choose to read rev K then also go read rev H (the exclusive access section is as short of a read if not shorter than the ARM ARM's).

    If you truly want to understand CLREX/STREX/LDREX IMO you have to read the bus docs related to that family of cores. And read the ARM ARM and the truth is somewhere in the middle.