My situation is this: On a Linux machine I have a shared-memory region inside which lives a ring-buffer of audio samples. This ring-buffer's consumer is a hard-real-time Xenomai audio-callback, so it is not allowed to do any kind of mutex-locking, etc. The producer is a regular Linux process, scheduled via the SCHED_FIFO
scheduler but otherwise allowed to do whatever it likes.
Here is a (very simplified) implementation of the C++ class I use to implement the shared-memory-ring-buffer:
const size_t NUM_SAMPLES_PER_RENDER_BUFFER = 1024; // producer renders 1024 audio samples at a time
const size_t NUM_RENDER_BUFFERS = 4; // quadruple-buffering, for now
struct SharedMemoryAudioData
{
public:
std::atomic<uint64_t> _totalSamplesProduced; /* since program start */
std::atomic<uint64_t> _totalSamplesConsumed; /* since program start */
float _samplesRingBuffer[NUM_SAMPLES_PER_RENDER_BUFFER*NUM_RENDER_BUFFERS];
};
... and the consumer callback operates something like this (pseudocode):
void RealTimeAudioCallback(float * writeSamplesToHere, uint32 numSamplesToWrite)
{
const size_t ringBufSize = NUM_SAMPLES_PER_RENDER_BUFFER*NUM_RENDER_BUFFERS;
size_t readIdx = (size_t) (sharedData._totalSamplesConsumed.load() % ringBufSize);
for (size_t i=0; i<numSamplesToWrite; i++)
{
writeSamplesToHere[i] = sharedData._samplesRingBuffer[readIdx++];
if (readIdx >= ringBufSize) readIdx = 0;
}
sharedData._totalSamplesConsumed = sharedData._totalSamplesConsumed.load() + numSamplesToWrite;
}
The producer code is slightly more elaborate; it runs periodically and compares _totalSamplesProduced
and _totalSamplesConsumed
and determines how many more samples it is appropriate to write into the shared-memory region to keep the ring-buffer filled up (i.e. it writes as many samples as possible without risking overwriting the area of the ring-buffer that the consumer is currently reading from).
This all seems to be working well, and accesses to the _totalSamplesProduced
and _totalSamplesConsumed
member-variables are concurrency-safe since they are both std::atomic
.
My question is about the safety of the access to the values in _samplesRingBuffer
itself. They are also being written by the producer thread and read by the consumer thread, but they are not std::atomic
, so it seems likely that this is technically undefined behavior. One solution would be to change the array's type to std::atomic<float>
, but I suspect that could add a significant performance cost, so I'd rather not do that.
Is there something else I should be doing here to help make the shared accesses to _sampleRingBuffer
more concurrency-safe, while still maintaining high efficiency? (I realize that there is always the chance of the producer not producing in a timely manner, which would result in an underrun and likely an audio glitch, but it doesn't seem to happen in practice, and anyway I don't think there is anything I can do about that)
The problem is if the producer doesn't touch the _totalSamplesConsumed
and the consumer doesn't touch _totalSamplesProduced
: then they never synchronize on anything. In this situation accessing the samples is indeed a data race.
However, the consumer should read _totalSamplesProduced
, because it should cap the samples to read to the samples available. Doing that is sufficient to synchronize the producer & consumer, and for the samples written by the producer to be safely consumed.
It's also sufficient for the producer to release, and the consumer to acquire, _totalSamplesProduced
, you don't need the default sequential consistency you're using now.
One other bug:
sharedData._totalSamplesConsumed =
sharedData._totalSamplesConsumed.load()
+ numSamplesToWrite;
is not atomic. Use fetch_add
instead.