rustmutexrwlock

acquiring nested rwlocks in a thread-safe way?


I am managing some per-customer data in a thread-safe way as follows:

customer_id_to_data: Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>

It's very important that I not acquire the write lock on the HashMap unless it's absolutely necessary, which is only when I'm creating a new CustomerId or deleting one.

The function that is supposed to be adding or updating a single new CustomerData needs to look like this:

fn load_customer_data_impl(
    customer_id_to_data: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
    customer_id: &CustomerId,
    new_data: CustomerData,
) -> () {
    if !{
        let read_guard = Self::get_customer_id_to_data_read_guard(customer_id_to_data);
        if !read_guard.contains_key(customer_id) {
            false
        } else {
            let mut data_write_guard = read_guard
                .get(customer_id)
                .unwrap()  // just checked, still holding rlock
                .write()
                .unwrap_or_else(|| sys::process::exit(1));
            *data_write_guard = new_index;
            true
        }
    } {
        let mut write_guard = Self::get_customer_id_to_data_write_guard(customer_id_to_data);
        write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
        }
    }

This formulation satisfies the necessary conditions that

However, the compiler complains about this last point: it does not see that if I have moved new_data in the if clause, I will never reach the if body.

I can't figure out how to re-organize this method to prevent that from happening.

As an attempt, the following naive code does NOT work, because the read guard in the if clause is released before it is re-acquired in the if body which may give rise to a race condition:

if Self::get_customer_id_to_data_read_guard(customer_id_to_data).contains_key(customer_id) {
    let read_guard = Self::get_customer_id_to_data_read_guard(customer_id);
    let mut data_write_guard = read_guard
        .get(customer_id)
        .unwrap()  // just checked, still holding rlock
        .write()
        .unwrap_or_else(|| sys::process::exit(1));
    *data_write_guard = new_data;        
} else {
    let mut write_guard = Self::get_customer_id_to_data_write_guard(customer_id_to_data);
    write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
}

Another naive non-starter is like

if !get_read_lock().contains_key(...) {
    get_write_lock().insert(...);
} else {
    *get_read_lock().get_index_write_lock(...) = new_data;
}

which has the same failure mode of a race condition between checking the if clause and entering the else body.

I am aware of the if-let-chains RFC which would seem to be a way out, but don't see how to write a correct implementation without it.


Solution

  • I'll assume that this is your minimal reproducible example:

    use std::{
        collections::HashMap,
        sync::{Arc, RwLock, RwLockReadGuard, RwLockWriteGuard},
    };
    
    #[derive(Eq, PartialEq, Hash, Clone, Copy)]
    struct CustomerId;
    struct CustomerData;
    
    fn get_customer_id_to_data_read_guard(
        s: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
    ) -> RwLockReadGuard<HashMap<CustomerId, Arc<RwLock<CustomerData>>>> {
        s.read().unwrap()
    }
    
    fn get_customer_id_to_data_write_guard(
        s: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
    ) -> RwLockWriteGuard<HashMap<CustomerId, Arc<RwLock<CustomerData>>>> {
        s.write().unwrap()
    }
    
    fn load_customer_data_impl(
        customer_id_to_data: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
        customer_id: &CustomerId,
        new_data: CustomerData,
    ) -> () {
        if !{
            let read_guard = get_customer_id_to_data_read_guard(customer_id_to_data);
            if !read_guard.contains_key(customer_id) {
                false
            } else {
                let mut data_write_guard = read_guard
                    .get(customer_id)
                    .unwrap() // just checked, still holding rlock
                    .write()
                    .unwrap_or_else(|_| std::process::exit(1));
                *data_write_guard = new_data;
                true
            }
        } {
            let mut write_guard = get_customer_id_to_data_write_guard(customer_id_to_data);
            write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
        }
    }
    

    The solution is quite simple. The reason why the compiler gets confused is that it doesn't understand that false/true in your if condition reflect different lifetimes of new_data. Lifetimes are a compile time thing, and the content of a boolean variable is a runtime information.

    The point where the compiler knows that new_data is available is where you generate the false value - simply put your code there :)

    The only pitfall is that the read_guard also still exists there, so aquiring a write_guard there would cause a deadlock. But luckily the drop() function exists to fix that:

    fn load_customer_data_impl(
        customer_id_to_data: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
        customer_id: &CustomerId,
        new_data: CustomerData,
    ) -> () {
        let read_guard = get_customer_id_to_data_read_guard(customer_id_to_data);
        if !read_guard.contains_key(customer_id) {
            drop(read_guard);
            let mut write_guard = get_customer_id_to_data_write_guard(customer_id_to_data);
            write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
        } else {
            let mut data_write_guard = read_guard
                .get(customer_id)
                .unwrap() // just checked, still holding rlock
                .write()
                .unwrap_or_else(|_| std::process::exit(1));
            *data_write_guard = new_data;
        }
    }
    

    You can further improve your code by avoiding the .contains_key function, which causes an unnecessary table lookup. Simply use .get directly:

    fn load_customer_data_impl(
        customer_id_to_data: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
        customer_id: &CustomerId,
        new_data: CustomerData,
    ) -> () {
        let read_guard = get_customer_id_to_data_read_guard(customer_id_to_data);
        if let Some(existing_data) = read_guard.get(customer_id) {
            let mut data_write_guard = existing_data
                .write()
                .unwrap_or_else(|_| std::process::exit(1));
            *data_write_guard = new_data;
        } else {
            drop(read_guard);
            let mut write_guard = get_customer_id_to_data_write_guard(customer_id_to_data);
            write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
        }
    }
    

    As a last remark: unwrap_or_else(|_| std::process::exit(1)) is kind of an antipattern. Even panic!() or .unwrap() still cause destructors of objects to run, which could close file handles or shutdown database connections. exit() prevents all those things, so use it with care.