c++c++11language-lawyermutexmemory-model

Can the C++ compiler coalesce adjacent mutex locks?


Consider the following code, with adjacent mutex sections containing shared memory accesses:

std::mutex mutex;
int x;

void func() {
    {
        std::lock_guard lock{mutex};
        x++;
    }
    {
        std::lock_guard lock{mutex};
        x++;
    }
}

Without the mutexes, the compiler could obviously coalesce the increments, like so:

void func() {
    x += 2;
}

This got me thinking about what might prevent that optimisation. We know even if x is std::atomic<int>, the optimisation is still legal (but perhaps not done in practice).

But what about with mutexes? Is it legal for the compiler to transform func into this?

void func() {
    std::lock_guard lock{mutex};
    x += 2;
}

It's clear that the writes to x must be visible side effects to other threads due to acquire-release semantics, but I'm not sure if that forbids the optimisation. Perhaps you can argue under the as-if rule that you couldn't tell the difference if the optimisation was done or not, therefore the optimisation is legal (since the difference between "unlock then lock again" and "keep holding lock" depends only on thread scheduling timing).
On the other hand, it seems plausible this optimisation could slippery-slope to every program containing a mutex being optimised into: lock mutex, do entire program, unlock mutex. Such behaviour would be obviously unhelpful for concurrent applications, so it might be reasonably forbidden.

What aspect of the C++ memory model, and wording in the Standard, allows or forbids this optimisation?

(Currently, Clang and GCC do not perform this optimisation, as can be seen here. However, this is not proof that the optimisation is or should necessarily be illegal.)


Solution

  • Yes, the transformation is allowed, for the reasons you yourself gave.

    For any given input, no matter how the rest of the code looks, the set of observable behaviors permitted with the transformed function will always be a subset of the observable behaviors permitted with the original function.

    There is no requirement that unlocking the mutex will cause another thread blocked on acquiring a lock to immediately wake up and atomically with the unlocking acquire the lock. Therefore, it is perfectly fine to schedule the unlocking thread so that it will immediately reacquire the lock, regardless of the actions of any other thread.

    Whether the lock was released is not an observable behavior. Observable behavior is only volatile access of objects, the final state of files written and interactive IO in an implementation-defined manner. See [intro.abstract]/6.

    So the release/reacquire can be skipped without affecting the observable behavior with this scheduling scheme, which is a permissible scheduling scheme and therefore defines one of the permissible observable behavior choices. The compiler only has to assure that any one choice of the permissible observable behaviors is realized. See [intro.abstract]/5.

    On the other hand, it seems plausible this optimisation could slippery-slope to every program containing a mutex being optimised into: lock mutex, do entire program, unlock mutex. Such behaviour would be obviously unhelpful for concurrent applications, so it might be reasonably forbidden.

    There is [intro.progress]/8, which recommends that the thread executing main and the threads created with std::thread/std::jthread should provide the defined concurrent forward-progress guarantees, which means that the thread should, as long as it hasn't terminated, make progress ([intro.progress]/6) eventually.

    For the purpose of making progress, per [intro.progress]/4, a blocking library call is considered to continuously check the condition causing it to block, each such check "making progress". Other execution steps that cause progress to be made are access through volatiles, completion of a call to a library IO function, of synchronization operations and of atomic operations.

    However, I don't think that this even forbids the transformation of

    while(true)
    {
        std::lock_guard lock{mutex};
        //A
        x++; // (assume x is unsigned here)
    }
    

    into

    std::lock_guard lock{mutex};
    while(true)
    {
        x++;
    }
    

    Even if another thread attempts to lock the mutex, the following scheduling behavior would not violate this forward progress guarantee: Repeatedly execute the loop until //A, then switch to the other thread to "make progress" by checking the blocking condition, then switches back and repeat the above.

    Forward progress and scheduling of threads is largely left as a quality-of-implementation decision to the implementation. See also N3209 discussing this from when multi-threaded execution was added to C++11.

    I do not expect that any compiler will make any attempt at such transformations at the code level.

    However, even common OS schedulers won't provide any strict guarantee that the scheduling scheme I described above won't happen. Generally it just becomes probabilistically unlikely over longer times because threads are preempted at more or less stochastically noisy intervals. If the scheduling happens to be as described above, then even without the compilers transformation the behavior of the program will be as if the transformation was made.

    I could imagine other environments where the described scheduling behavior may occur indefinitely. Suppose for example that the mutex locking is implemented as a simple spin lock on a uniprocessor system and suppose that the scheduler is perfectly fair and deterministic in the number of instructions it will let each thread execute in any time slice. In such an environment you may be unlucky that the locked thread happens to be always able to reacquire the lock in the same time slice that it is released.

    Or multithreading may be completely cooperative, in which case there may be an issue if the mutex is not implemented to yield after an unlock.

    I think the standard can't make any stronger guarantees than it does, in part because it would make supporting such environments impossible.