c++atomicc11stdatomictest-and-set

Why does C/C++ not have a atomic flag test_and_clear?


If you wanted to do something like set a flag on a event in some context, and in some other context check and clear the event, C/C++ doesn't allow you to do in single atomic call per context.

You'd have to invert the flag, so clear the flag on the event, check and set the flag when checking the event.

Not a huge deal, but seems backwards in this scenario, especially considering the default state of the flag is false, which in the inverted sense means the event is asserted by default.

I supposed alternatively, an atomic bool with atomic_exchange could be used instead.


Solution

  • Practical advice: use atomic<bool> or atomic<unsigned char> instead of atomic_flag in normal code if its limited set of operations makes your code less efficient. Unless you care about being lock-free on very primitive machines where atomic<bool> and atomic<int> might not be lock-free. (Or perhaps use C++20 std::atomic_unsigned_lock_free.)


    TAS (Test And Set) is one of the well-known primitive atomic RMW operations in computer science that can be a building block for mutual-exclusion (locking). On some old hardware (like Motorola 68000), it's the only atomic RMW.

    There are no machines where the only atomic RMW is a test-and-clear. (Zero-initializing memory is the norm, so a spinlock or mutex in static storage will start out in the unlocked state if 0 means unlocked, and TAS takes the lock. You need an atomic RMW when acquiring a spinlock but not when releasing.)

    TAS can also be efficiently implemented in terms of an atomic exchange, aka swap, which some other old hardware provides as its only atomic RMW. (local = foo.exchange(true) and test the result.)

    But neither TAS nor exchange work as building blocks for arbitrary lock-free atomic RMWs like fetch_xor or CAS (Compare-and-Swap e.g. compare_exchange_weak/strong). A machine with just TAS or a way to emulate it can't provide lock-free std::atomic<bool>, but can provide lock-free std::atomic_flag.

    (LL/SC or CAS itself are building blocks for arbitrary lock-free RMWs on a single variable. All(?) modern machines that support multiple cores have at least one of those, and sometimes also direct support for some of the common integer operations like fetch_add, fetch_or, exchange, etc. like on x86-64 and ARMv8.1. And of course atomicity guarantees for pure-load and pure-store operations so .load() can be done as an actual load, not an RMW which would fault on read-only memory and have contention between readers.)

    Hypothetically, on a CPU with only a test-can-clear, you could still implement atomic_flag with the C++ false state having a non-zero object-representation; before C++20 it wasn't required that static initialization made it false. But if a CPU had TAS and "TAC" (or exchange which can do both) but not enough functionality to implement lock-free atomic<bool>, you can't take full advantage of it via atomic_flag.


    atomic_flag is required to be lock-free, unlike any atomic<T>. It exists to expose a minimal set of lock-free functionality that can be implemented across a wide range of hardware that's able to implement ISO C++11 at all (including std::mutex and the locking for non-lock-free std::atomic<T>.)

    Some things it avoids requiring:


    If any operation supported by atomic<bool> needs a mutex (due to lack of CAS or LL/SC), all operations have to respect the mutex. (Except pure loads, if tearing and memory ordering aren't a problem, e.g. for relaxed loads of a byte or bool, or atomic int on some systems, but that's a corner case for obsolete systems that modern compilers probably don't bother with.) Then atomic<bool>::is_always_lock_free has to be false.

    C++20 added std::atomic_signed_lock_free and std::atomic_unsigned_lock_free integer types that are guaranteed lock-free (and are "most efficient" for wait/notify). Those are optional in freestanding implementations (not hosted under an OS), but I think this rules out C++20 hosted implementations on 386 or 68000; you'd need 486 or 68020 or later. I think C++20 decided to add useful stuff that people want even if that means some retro hardware can't implement C++20. Modern C++ has been making choices like that in other areas, like requiring signed integers to be two's complement, dropping the implementation choice of one's complement or sign/magnitude.

    In ISO C, the thread/mutex/atomics stuff is optional, unlike ISO C++, so modern C can still be implemented on ancient hardware by leaving out the thread support library entirely.


    ISAs with just TAS or Swap


    Mutual exclusion without atomic RMWs?

    Technically, mutual exclusion is possible without an atomic RMW via Peterson's algorithm with just seq_cst loads and stores, but that requires an array with an entry for each possible thread, and C++ allows starting new threads after mutex objects already exist and are in use. So that's not really viable. Lamport's bakery algorithm has the same requirement for an array with size = NUM_THREADS.

    C11 and C++11 weren't interested in trying to support threads on such ancient CPUs where multiprocessing is such a struggle, so they went ahead and required atomic_flag to support lock-free RMWs.

    On a uniprocessor machine, one way to implement any atomic RMW is just to disable/re-enable interrupts around it. This is still lock-free in a sense: it can't create a deadlock between an interrupt or signal handler and the main thread like an actual spinlock or mutex could. This would require a system call in a hosted implementation.

    Or, on uniprocessor machines where interrupts can only happen at instruction boundaries, do it with a single instruction. e.g. x86 add [mem], eax is atomic wrt. interrupts and thus context switches on the same core, but not in a multi-core system unless you also use the lock prefix. (Is incrementing an int effectively atomic in specific cases?).

    So some CISCs with memory-destination RMW instructions can use those for operations that only have to be atomicity wrt. other threads (or interrupt handlers) that could only run on the same core (because this is a uniprocessor system or because we only care about interrupt / signal handlers). Even though the instructions don't have any special RMW guarantees wrt. concurrent writers like DMA or other bus-master I/O devices. Or other cores if any existed.

    Some ISAs like m68k can save partial progress of executing an instruction and resume after an interrupt, defeating this. But x86 isn't like that; interrupts are only processed at instruction boundaries. (See Is x86 CMPXCHG atomic, if so why does it need LOCK?)


    Related: