I am reading the implementation of rigtorp's SPSCQueue, which is a very elegant design and have very good benchmark.
I understand most of the philosophy of design as described in the README. What I could not understand is how the writeIdxCache_
and readIdxCache_
helps.
Looking at the following function for fetching an item from the queue
RIGTORP_NODISCARD T *front() noexcept {
auto const readIdx = readIdx_.load(std::memory_order_relaxed);
if (readIdx == writeIdxCache_) {
writeIdxCache_ = writeIdx_.load(std::memory_order_acquire);
if (writeIdxCache_ == readIdx) {
return nullptr;
}
}
return &slots_[readIdx + kPadding];
}
If I compare the above implementation with the following ones, what is the benefit?
What is the benefit of using the writeIdxCache_
in the first place
RIGTORP_NODISCARD T *front() noexcept {
auto const readIdx = readIdx_.load(std::memory_order_relaxed);
if (readIdx == writeIdx_.load(std::memory_order_acquire)) {
return nullptr;
}
return &slots_[readIdx + kPadding];
}
readIdxCache_
Here I directly set the readIdxCache_
as well when loading it. Is this approach still working, or it would break the queue? Is it better or worse than case 0?
RIGTORP_NODISCARD T *front() noexcept {
readIdxCache_ = readIdx_.load(std::memory_order_relaxed);
if (readIdxCache_ == writeIdxCache_) {
writeIdxCache_ = writeIdx_.load(std::memory_order_acquire);
if (writeIdxCache_ == readIdxCache_) {
return nullptr;
}
}
return &slots_[readIdx + kPadding];
}
For a CPU to commit a store to a cache line, it needs exclusive ownership of that cache line. (MESI Modified or Exclusive state). If one thread is writing a shared variable and no other threads are reading it, things are efficient and the cache line containing it can stay in that state.
But if another thread so much as reads it, its MESI Share Request will transition the cache line from Modified to Shared, so the writer will need a Read For Ownership before it can commit its next store. Also, the thread doing the reading will have to wait for that cache miss (share request) on its loads, if the writer has written since its last read. (A Read For Ownership invalidates all other copies of the cache line in other cores.)
Without the index caches, every call to front()
reads both shared variables, including the one the other thread sometimes writes. This creates contention on writeIdx_
as cache-coherency messages for it have to get sent for most reads and most writes. (And similarly for the writer creating contention on readIdx_
by reading it every time.)
With the index caches, front()
only touches writeIdx_
occasionally, only after all the queue entries it saw last time have been read. And the writers similarly avoids touching readIdx_
often, so if you're removing entries from the queue, writes to readIdx_
can be efficient.