c++multithreadingc++11atomicstdatomic

Understanding std::atomic::compare_exchange_weak() in C++11


bool compare_exchange_weak (T& expected, T val, ..);

compare_exchange_weak() is one of compare-exchange primitives provided in C++11. It's weak in the sense that it returns false even if the value of the object is equal to expected. This is due to spurious failure on some platforms where a sequence of instructions (instead of one as on x86) are used to implement it. On such platforms, context switch, reloading of the same address (or cache line) by another thread, etc can fail the primitive. It's spurious as it's not the value of the object (not equal to expected) that fails the operation. Instead, it's kind of timing issues.

But what puzzles me is what's said in C++11 Standard (ISO/IEC 14882),

29.6.5 .. A consequence of spurious failure is that nearly all uses of weak compare-and-exchange will be in a loop.

Why does it have to be in a loop in nearly all uses ? Does that mean we shall loop when it fails because of spurious failures? If that's the case, why do we bother use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong() which I think should get rid of spurious failures for us. What are the common use cases of compare_exchange_weak()?

Another question related. In his book "C++ Concurrency In Action" Anthony says,

//Because compare_exchange_weak() can fail spuriously, it must typically
//be used in a loop:

bool expected=false;
extern atomic<bool> b; // set somewhere else
while(!b.compare_exchange_weak(expected,true) && !expected);

//In this case, you keep looping as long as expected is still false,
//indicating that the compare_exchange_weak() call failed spuriously.

Why is !expected there in the loop condition? Does it there to prevent that all threads may starve and make no progress for some time?

One last question

On platforms that no single hardware CAS instruction exist, both the weak and strong version are implemented using LL/SC (like ARM, PowerPC, etc). So is there any difference between the following two loops? Why, if any? (To me, they should have similar performance.)

// use LL/SC (or CAS on x86) and ignore/loop on spurious failures
while (!compare_exchange_weak(..))
{ .. }

// use LL/SC (or CAS on x86) and ignore/loop on spurious failures
while (!compare_exchange_strong(..)) 
{ .. }

I come up w/ this last question you guys all mention that there maybe a performance difference inside a loop. It's also mentioned by the C++11 Standard (ISO/IEC 14882):

When a compare-and-exchange is in a loop, the weak version will yield better performance on some platforms.

But as analyzed above, two versions in a loop should give the same/similar performance. What's the thing I miss?


Solution

  • I'm trying to answer this myself, after going through various online resources (e.g., this one and this one), the C++11 Standard, as well as the answers given here.

    The related questions are merged (e.g., "why !expected ?" is merged with "why put compare_exchange_weak() in a loop ?") and answers are given accordingly.


    Why does compare_exchange_weak() have to be in a loop in nearly all uses?

    Typical Pattern A

    You need achieve an atomic update based on the value in the atomic variable. A failure indicates that the variable is not updated with our desired value and we want to retry it. Note that we don't really care about whether it fails due to concurrent write or spurious failure. But we do care that it is us that make this change.

    expected = current.load();
    do desired = function(expected);
    while (!current.compare_exchange_weak(expected, desired));
    

    A real-world example is for several threads to add an element to a singly linked list concurrently. Each thread first loads the head pointer, allocates a new node and appends the head to this new node. Finally, it tries to swap the new node with the head.

    Another example is to implement mutex using std::atomic<bool>. At most one thread can enter the critical section at a time, depending on which thread first set current to true and exit the loop.

    Typical Pattern B

    This is actually the pattern mentioned in Anthony's book. In contrary to pattern A, you want the atomic variable to be updated once, but you don't care who does it. As long as it's not updated, you try it again. This is typically used with boolean variables. E.g., you need implement a trigger for a state machine to move on. Which thread pulls the trigger is regardless.

    expected = false;
    // !expected: if expected is set to true by another thread, it's done!
    // Otherwise, it fails spuriously and we should try again.
    while (!current.compare_exchange_weak(expected, true) && !expected);
    

    Note that we generally cannot use this pattern to implement a mutex. Otherwise, multiple threads may be inside the critical section at the same time.

    That said, it should be rare to use compare_exchange_weak() outside a loop. On the contrary, there are cases that the strong version is in use. E.g.,

    bool criticalSection_tryEnter(lock)
    {
      bool flag = false;
      return lock.compare_exchange_strong(flag, true);
    }
    

    compare_exchange_weak is not proper here because when it returns due to spurious failure, it's likely that no one occupies the critical section yet.

    Starving Thread?

    One point worth mentioning is that what happens if spurious failures continue to happen thus starving the thread? Theoretically it could happen on platforms when compare_exchange_XXX() is implement as a sequence of instructions (e.g., LL/SC). Frequent access of the same cache line between LL and SC will produce continuous spurious failures. A more realistic example is due to a dumb scheduling where all concurrent threads are interleaved in the following way.

    Time
     |  thread 1 (LL)
     |  thread 2 (LL)
     |  thread 1 (compare, SC), fails spuriously due to thread 2's LL
     |  thread 1 (LL)
     |  thread 2 (compare, SC), fails spuriously due to thread 1's LL
     |  thread 2 (LL)
     v  ..
    

    Can it happen?

    It won't happen forever, fortunately, thanks to what C++11 requires:

    Implementations should ensure that weak compare-and-exchange operations do not consistently return false unless either the atomic object has value different from expected or there are concurrent modifications to the atomic object.

    Why do we bother use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong().

    It depends.

    Case 1: When both need to be used inside a loop. C++11 says:

    When a compare-and-exchange is in a loop, the weak version will yield better performance on some platforms.

    On x86 (at least currently. Maybe it'll resort to a similiar scheme as LL/SC one day for performance when more cores are introduced), the weak and strong version are essentially the same because they both boil down to the single instruction cmpxchg. On some other platforms where compare_exchange_XXX() isn't implemented atomically (here meaning no single hardware primitive exists), the weak version inside the loop may win the battle because the strong one will have to handle the spurious failures and retry accordingly.

    But,

    rarely, we may prefer compare_exchange_strong() over compare_exchange_weak() even in a loop. E.g., when there is a lot of things to do between atomic variable is loaded and a calculated new value is exchanged out (see function() above). If the atomic variable itself doesn't change frequently, we don't need repeat the costly calculation for every spurious failure. Instead, we may hope that compare_exchange_strong() "absorb" such failures and we only repeat calculation when it fails due to a real value change.

    Case 2: When only compare_exchange_weak() need to be used inside a loop. C++11 also says:

    When a weak compare-and-exchange would require a loop and a strong one would not, the strong one is preferable.

    This is typically the case when you loop just to eliminate spurious failures from the weak version. You retry until exchange is either successful or failed because of concurrent write.

    expected = false;
    // !expected: if it fails spuriously, we should try again.
    while (!current.compare_exchange_weak(expected, true) && !expected);
    

    At best, it's reinventing the wheels and perform the same as compare_exchange_strong(). Worse? This approach fails to take full advantage of machines that provide non-spurious compare-and-exchange in hardware.

    Last, if you loop for other things (e.g., see "Typical Pattern A" above), then there is a good chance that compare_exchange_strong() shall also be put in a loop, which brings us back to the previous case.