it is right? is there has any data race?
a read-write lock write less and read more write operate not important,it can starve
#include<vector>
#include<atomic>
std::vector<int> val;
std::atomic_uint32_t lock;
void Append(int v) {
uint32_t expect = 0;
// write lock
while(!lock.compare_exchange_strong(expect, 1 << 31));
// do some read
val.push_back(v);
// write unlock
lock.fetch_xor(1 << 31);
}
int Get(size_t i) {
// read lock
lock.fetch_add(1);
while(lock.load() & (1 << 31));
// do some read
int v = val[i];
// read unlock
lock.fetch_sub(1);
return v;
}
This isn't lock-free of course, you're intentionally hand-rolling a readers/writers spinlock with busy-waits in both cases. If a writer sleeps forever, no other threads max any progress.
Looks ok to me for correctness, but not efficient if a writer is waiting for long. (Or if readers are waiting while a writer grows the array with a realloc + copy). And not even an x86 _mm_pause()
in the spin-loops, let alone a C++20 lock.wait()
and lock.notify_all()
which could let it sleep and have the OS wake up readers when a writer calls notify_all. (Writers would probably always have to call notify_all, making them more expensive, unless you add extra state that readers update before sleeping.)
C++17 std::shared_lock
vs. std::unique_lock
on a std::mutex
already implements this, but with fallback to a sleep if the lock isn't available. See Reader/Writer Locks in C++. (Not sure how it prioritizes, like if it spends extra effort to try to avoid starvation or not. It seems you're intentionally prioritizing reads over writes, which a standard library lock probably doesn't; I'd guess that out of the box they'd be designed to avoid starvation.)
You could create a container that allowed concurrent append, with an atomic fetch_add
incrementing a write_pos
counter up to a limit, only taking a unique_lock if you need to reallocate. (Keep that counter in a separate cache line from the pointer to the data, to avoid contention with reads.) A bit like a lock-free queue in a circular buffer, but much simpler because you're only growing and there's no bounds-checking in the reader.
If read-side scalability is actually important, you could consider RCU for near-perfect reader concurrency, being truly read-only and not taking any lock. If the array isn't too huge to copy, and you don't need to reallocate it too often, that could work well. Or combine with the previous idea: reserve extra space for atomic growth, so you only need an RCU reallocate when it gets to the end of the size you've reserved. That can make actual read-copy-update operations rare enough to be usable even with a decent frequency of writes. (And readers can still be reading fully wait-free while the copy/update is running; writers might use a lock to prevent too much wasted work with multiple writers copying at once.)
Other minor things about your implementation:
The operations can be std::memory_order_acquire
for taking the lock, and release
for releasing. That's the same strength C++ std::mutex
operations have.
You can use the return value of .fetch_add
instead of doing a separate seq_cst load.
static const uint32_t WRITE_LOCKED = 1UL << 31;
int Get(size_t i) {
// read lock
auto lockval = lock.fetch_add(1, std::memory_order_acquire);
while(lockval & WRITE_LOCKED){ // wait for any writers to finish
#ifdef __x86_64__
_mm_pause(); // or boost::detail::sp_thread_pause(), Rust std::hint::spin_loop(), C# SpinOnce() or equivalent.
// TODO: yield() or sleep() or something after a few iters
// or lock.wait(lockval) and have writers call .notify() if there are any readers waiting.
#endif
lockval = lock.load(std::memory_order_acquire);
}
// do some read
int v = val[i];
lock.fetch_sub(1, std::memory_order_release); // read unlock
return v;
}
Also, the writer could spin read-only if it doesn't get the lock the first try, waiting until it looks like it's available before trying another compare_exchange_weak()
. That will disrupt readers somewhat less, especially if you put a pause
or two in its spin loop so it's not spamming attempts.
You could give the writer a timeout of 500 iterations or something before it sets the bit and waits for all the readers to finish, preventing new readers from entering. Except you'd need to change something to be able to detect that: new readers increment the count even if it's write-locked so you can't tell the difference between waiting for old readers to finish vs. new readers waiting to start. (That's fine for your algorithm, but it doesn't give a writer any way of boosting its priority.)
If you want that, readers would have to work differently somehow. But making their increment conditional on the high bit might require using a compare_exchange
retry loop in the reader, instead of just fetch_add, increasing contention between readers. So maybe there's a more clever way to go about it.