c++lockinglock-freestdatomicspinlock

Hand-rolled readers-writers spin lock with priority for readers (write starvation is ok)


it is right? is there has any data race?

a read-write lock write less and read more write operate not important,it can starve


#include<vector>
#include<atomic>

std::vector<int> val;
std::atomic_uint32_t lock;

void Append(int v) {
  uint32_t expect = 0;
  // write lock
  while(!lock.compare_exchange_strong(expect, 1 << 31));
  // do some read
  val.push_back(v);
  // write unlock
  lock.fetch_xor(1 << 31);
}

int Get(size_t i) {
  // read lock
  lock.fetch_add(1);
  while(lock.load() & (1 << 31));
  // do some read
  int v = val[i];
  // read unlock
  lock.fetch_sub(1);
  return v;
}

Solution

  • This isn't lock-free of course, you're intentionally hand-rolling a readers/writers spinlock with busy-waits in both cases. If a writer sleeps forever, no other threads max any progress.

    Looks ok to me for correctness, but not efficient if a writer is waiting for long. (Or if readers are waiting while a writer grows the array with a realloc + copy). And not even an x86 _mm_pause() in the spin-loops, let alone a C++20 lock.wait() and lock.notify_all() which could let it sleep and have the OS wake up readers when a writer calls notify_all. (Writers would probably always have to call notify_all, making them more expensive, unless you add extra state that readers update before sleeping.)

    C++17 std::shared_lock vs. std::unique_lock on a std::mutex already implements this, but with fallback to a sleep if the lock isn't available. See Reader/Writer Locks in C++. (Not sure how it prioritizes, like if it spends extra effort to try to avoid starvation or not. It seems you're intentionally prioritizing reads over writes, which a standard library lock probably doesn't; I'd guess that out of the box they'd be designed to avoid starvation.)


    You could create a container that allowed concurrent append, with an atomic fetch_add incrementing a write_pos counter up to a limit, only taking a unique_lock if you need to reallocate. (Keep that counter in a separate cache line from the pointer to the data, to avoid contention with reads.) A bit like a lock-free queue in a circular buffer, but much simpler because you're only growing and there's no bounds-checking in the reader.

    If read-side scalability is actually important, you could consider RCU for near-perfect reader concurrency, being truly read-only and not taking any lock. If the array isn't too huge to copy, and you don't need to reallocate it too often, that could work well. Or combine with the previous idea: reserve extra space for atomic growth, so you only need an RCU reallocate when it gets to the end of the size you've reserved. That can make actual read-copy-update operations rare enough to be usable even with a decent frequency of writes. (And readers can still be reading fully wait-free while the copy/update is running; writers might use a lock to prevent too much wasted work with multiple writers copying at once.)


    Other minor things about your implementation:

    static const uint32_t WRITE_LOCKED = 1UL << 31;
    int Get(size_t i) {
      // read lock
      auto lockval = lock.fetch_add(1, std::memory_order_acquire);
      while(lockval & WRITE_LOCKED){     // wait for any writers to finish
    #ifdef __x86_64__
          _mm_pause();        // or boost::detail::sp_thread_pause(), Rust std::hint::spin_loop(), C# SpinOnce() or equivalent.
          // TODO: yield() or sleep() or something after a few iters
          // or lock.wait(lockval)  and have writers call .notify() if there are any readers waiting.
    #endif 
          lockval = lock.load(std::memory_order_acquire);
      }
    
      // do some read
      int v = val[i];
    
      lock.fetch_sub(1, std::memory_order_release);    // read unlock
      return v;
    }
    

    Also, the writer could spin read-only if it doesn't get the lock the first try, waiting until it looks like it's available before trying another compare_exchange_weak(). That will disrupt readers somewhat less, especially if you put a pause or two in its spin loop so it's not spamming attempts.

    You could give the writer a timeout of 500 iterations or something before it sets the bit and waits for all the readers to finish, preventing new readers from entering. Except you'd need to change something to be able to detect that: new readers increment the count even if it's write-locked so you can't tell the difference between waiting for old readers to finish vs. new readers waiting to start. (That's fine for your algorithm, but it doesn't give a writer any way of boosting its priority.)

    If you want that, readers would have to work differently somehow. But making their increment conditional on the high bit might require using a compare_exchange retry loop in the reader, instead of just fetch_add, increasing contention between readers. So maybe there's a more clever way to go about it.