multithreadingrustrwlock

Early stop of multiple RwLock::write waiting in Rust


My Rust code uses RwLock to process data in multiple threads. Each thread fills a common storage while using the read lock (e.g. filling up a database, but my case is a bit different). Eventually, the common storage will fill up. I need to pause all processing, reallocate storage space (e.g. allocate more disk space from cloud), and continue.

// psudo-code
fn thread_worker(tasks) {
  let lock = rwlock.read().unwrap();
  for task in tasks {
    // please ignore out_of_space check race condition
    // it's here just to explain the question 
    if out_of_space {
      drop(lock);
      let write_lock = rwlock.write().unwrap();
      // get more storage
      drop(write_lock);
      lock = rwlock.read().unwrap();
    }
    // handle task WITHOUT getting a read lock on every pass
    // getting a lock is far costlier than actual task processing
  }
  drop(lock);
}

Since all threads will quickly hit out of space at about the same time, they can all release the read lock, and get a write. The first thread that gets the write lock will fix the storage issue. But now I have a possible temporary deadlock situation - all other threads are also waiting for the write lock even though they no longer need it.

So it is possible for this situation to happen: given 3 threads all waiting for write, the 1st gets the write, fixes the issue, releases write, and waits for read. The 2nd enters write but quickly skips because issue already fixed and releases. The 1st and 2nd threads will enter read and continue processing, but the 3rd is still waiting for write and will wait for it for a very long time until the first two either run out of space or finish all their work.

Given all threads waiting for write, how can I "abort" all other thread's waits from the first thread after it finishes its work, but before it releases the write lock it already got?

I saw there is a poisoning feature, but that was designed for panics, and reusing it for production seems wrong and tricky to get done correctly. Also Rust devs are thinking of removing it.

P.S. Each loop iteration is essentially a data[index] = value assignment, where data is a giant memmap shared by many threads. The index is slowly growing in all threads, so eventually all threads run out of memmap size. When that happens, memmap is destroyed, file reallocated, and a new memmap is created. Thus, it is impossible to get a read lock on every loop iteration.


Solution

  • Looking at your code, you could get away with an extra mutex:

    // pseudo-code
    fn thread_worker(tasks) {
      for task in tasks {
        if out_of_space {
          drop(lock);
          {
            let mutex = mutex.lock();      
            if out_of_space { // potentially updated by another worker
              let write_lock = rwlock.write();
              // get more storage
              ...
              // drop(write_lock); is automatic here
            }
            // drop(mutex); is automatic here
          }
          lock = rwlock.read();
        }
    
        // copy memory for the task
        ...
      }
    }
    

    The pattern used here is known as a Double-checked locking.

    This solves the issue you have that after reallocation the next party is not gonna wait on rwlock.write forever, because it will not pass the out_of_space check inside the mutex critical section.

    However this solution still has an issue that the first failed worker will wait for all the other workers to encounter out_of_space condition before it can proceed with reallocation, because it needs to wait for all read() locks to be dropped.

    I'd recommend to refactor this code to move the reallocation logic out of this method.

    Also try to avoid explicit drops if possible in favor or RAII which is usually a good practice.