c++shared-memorycircular-bufferlocklessxenomai

How to implement a ring buffer safely in shared memory safely when the consumer is operating in real-time context


My situation is this: On a Linux machine I have a shared-memory region inside which lives a ring-buffer of audio samples. This ring-buffer's consumer is a hard-real-time Xenomai audio-callback, so it is not allowed to do any kind of mutex-locking, etc. The producer is a regular Linux process, scheduled via the SCHED_FIFO scheduler but otherwise allowed to do whatever it likes.

Here is a (very simplified) implementation of the C++ class I use to implement the shared-memory-ring-buffer:

const size_t NUM_SAMPLES_PER_RENDER_BUFFER = 1024;  // producer renders 1024 audio samples at a time
const size_t NUM_RENDER_BUFFERS            = 4;     // quadruple-buffering, for now

struct SharedMemoryAudioData
{
public:
   std::atomic<uint64_t> _totalSamplesProduced; /* since program start */
   std::atomic<uint64_t> _totalSamplesConsumed; /* since program start */

   float _samplesRingBuffer[NUM_SAMPLES_PER_RENDER_BUFFER*NUM_RENDER_BUFFERS];
};

... and the consumer callback operates something like this (pseudocode):

void RealTimeAudioCallback(float * writeSamplesToHere, uint32 numSamplesToWrite)
{
   const size_t ringBufSize = NUM_SAMPLES_PER_RENDER_BUFFER*NUM_RENDER_BUFFERS;
   size_t readIdx = (size_t) (sharedData._totalSamplesConsumed.load() % ringBufSize);
   for (size_t i=0; i<numSamplesToWrite; i++)
   {
      writeSamplesToHere[i] = sharedData._samplesRingBuffer[readIdx++];
      if (readIdx >= ringBufSize) readIdx = 0;
   }
   sharedData._totalSamplesConsumed = sharedData._totalSamplesConsumed.load() + numSamplesToWrite;
}

The producer code is slightly more elaborate; it runs periodically and compares _totalSamplesProduced and _totalSamplesConsumed and determines how many more samples it is appropriate to write into the shared-memory region to keep the ring-buffer filled up (i.e. it writes as many samples as possible without risking overwriting the area of the ring-buffer that the consumer is currently reading from).

This all seems to be working well, and accesses to the _totalSamplesProduced and _totalSamplesConsumed member-variables are concurrency-safe since they are both std::atomic.

My question is about the safety of the access to the values in _samplesRingBuffer itself. They are also being written by the producer thread and read by the consumer thread, but they are not std::atomic, so it seems likely that this is technically undefined behavior. One solution would be to change the array's type to std::atomic<float>, but I suspect that could add a significant performance cost, so I'd rather not do that.

Is there something else I should be doing here to help make the shared accesses to _sampleRingBuffer more concurrency-safe, while still maintaining high efficiency? (I realize that there is always the chance of the producer not producing in a timely manner, which would result in an underrun and likely an audio glitch, but it doesn't seem to happen in practice, and anyway I don't think there is anything I can do about that)


Solution

  • The problem is if the producer doesn't touch the _totalSamplesConsumed and the consumer doesn't touch _totalSamplesProduced: then they never synchronize on anything. In this situation accessing the samples is indeed a data race.

    However, the consumer should read _totalSamplesProduced, because it should cap the samples to read to the samples available. Doing that is sufficient to synchronize the producer & consumer, and for the samples written by the producer to be safely consumed.

    It's also sufficient for the producer to release, and the consumer to acquire, _totalSamplesProduced, you don't need the default sequential consistency you're using now.

    One other bug:

    sharedData._totalSamplesConsumed =
      sharedData._totalSamplesConsumed.load()
      + numSamplesToWrite;
    

    is not atomic. Use fetch_add instead.