linuxkernelrcu

How can we be sure that all call_rcu() callbacks are invoked/completed in Linux kernel?


I have a kmem_cache which contains RCU-protected data.

I also use call_rcu() after updating the RCU-protected pointers to free the elements in further.

Before destroying the cache i have to make sure that all slabs are freed.

The documentation says that both synchronize_rcu() and call_rcu()'s callbacks wait for an end of the current grace period. So there's no guarantee that once the synchronize_rcu() returns all call_rcu()'s callbacks are completed and particularly freed the slabs. Neither i haven't found a guarantee that all call_rcu() callbacks are invoked synchronously in a FIFO manner which could allow us to use one more call_rcu() for the destroying the cache.

So, how can i make sure that all RCU-protected data are freed before destroying the cache?

P.S. In my case with x86-64 Linux kernel the synchronize_rcu() waits until all call_rcu()'s callbacks are completed and it somehow works, but it seems like undefined behaviour in terms of documentation.


Solution

  • The answer is the rcu_barrier() function.

    From the https://www.kernel.org/doc/Documentation/RCU/rcubarrier.txt:

    Unloading Modules That Use call_rcu()

    But what if p_callback is defined in an unloadable module?

    If we unload the module while some RCU callbacks are pending, the CPUs executing these callbacks are going to be severely disappointed when they are later invoked, as fancifully depicted at http://lwn.net/images/ns/kernel/rcu-drop.jpg.

    We could try placing a synchronize_rcu() in the module-exit code path, but this is not sufficient. Although synchronize_rcu() does wait for a grace period to elapse, it does not wait for the callbacks to complete.

    One might be tempted to try several back-to-back synchronize_rcu() calls, but this is still not guaranteed to work. If there is a very heavy RCU-callback load, then some of the callbacks might be deferred in order to allow other processing to proceed. Such deferral is required in realtime kernels in order to avoid excessive scheduling latencies.

    rcu_barrier()

    We instead need the rcu_barrier() primitive. Rather than waiting for a grace period to elapse, rcu_barrier() waits for all outstanding RCU callbacks to complete. Please note that rcu_barrier() does -not- imply synchronize_rcu(), in particular, if there are no RCU callbacks queued anywhere, rcu_barrier() is within its rights to return immediately, without waiting for a grace period to elapse.

    Pseudo-code using rcu_barrier() is as follows:

    1. Prevent any new RCU callbacks from being posted.
    2. Execute rcu_barrier().
    3. Allow the module to be unloaded.