atomicarmv8data-race

is data race safe in ARMV8?


As we know, access aligned fundamental data types in INTEL X86 architecture is atomic. How about ARMV8? I have tried to get the result from Arm Architecture Reference Manual Armv8, for A-profile architecture, I did find something related to atomicity. ARMV8 is other-multi-copy atomic. It promises that multi-threads access one same LOCATION is atomic. But it says LOCATION is a byte. I am wondering that if thread 1 writes an aligned uint64_t memory without lock and thread 2 reads or writes it without lock at the same time. Is it atomic?(uint64_t is 8 bytes, but LOCATION is only one byte)


Solution

  • This is explained in B2.2 of the ARMv8 Architecture Reference Manual. In general, ordinary loads and stores of up to 64 bits, if naturally aligned, are single-copy atomic. In particular, if one thread stores to an address and another loads that same address, the load is guaranteed to see either the old or the new value, with no tearing or other undefined behavior. This is roughly analogous to a relaxed load or store in C or C++; indeed, you can see that compilers emit ordinary load and store instructions for such atomic accesses. https://godbolt.org/z/cWjaed9rM


    Let's prove this for an example. For simplicity, let's use an aligned 2-byte halfword H, calling its bytes H0 and H1. Suppose that in the distant past, H was initialized to 0x0000 by a store instruction Wi; the respective writes to bytes H0 and H1 will be denoted Wi.0 and Wi.1. Now let a new store instruction Wn = {Wn.0,Wn.1} store the value 0xFFFF, and let it race with a load instruction R = {R.0,R.1}. Each of the accesses Wi, Wn, R is single-copy atomic by B2.2.1, first two bullets. We wish to show that either R.0,R.1 both return 0x00, or else they both return 0xFF.

    By B2.3.2 there is a reads-from relation pairing each read with some write. R.0 must read-from either Wi.0 or Wn.0, as those are the only two writes to H0, and thus it must return either 0x00 or 0xFF. Likewise, R.1 must also return either 0x00 or 0xFF. If they both return 0x00 we are done, so suppose that one of them, say R.1, returns 0xFF, and let us show that R.0 also returns 0xFF.

    We are supposing that R.1 reads-from Wn.1. By B2.2.2 (2), none of the overlapping writes generated by Wn are coherence-after the corresponding overlapping reads generated by R, in the sense of B2.3.2. In particular, Wn.0 is not coherence-after R.0.

    Note that Wn.0 is coherence-after Wi.0 (coherence order is a total order on writes, so one must come after the other, and we are assuming Wi took place very long ago, with sufficient sequencing or synchronization in between). So if R.0 reads-from Wi.0, we then have that Wn.0 is coherence-after R.0 (definition of coherence-after, second sentence). We just argued that is not the case, so R.0 does not read-from Wi.0; it must read-from Wn.0 and therefore return 0xFF. ∎


    Note that on x86, ordinary loads and stores implicitly come with acquire and release ordering respectively, and this is not true on ARM64. You have to use ldar / stlr for that.