I am managing some per-customer data in a thread-safe way as follows:
customer_id_to_data: Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>
It's very important that I not acquire the write lock on the HashMap unless it's absolutely necessary, which is only when I'm creating a new CustomerId
or deleting one.
The function that is supposed to be adding or updating a single new CustomerData needs to look like this:
fn load_customer_data_impl(
customer_id_to_data: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
customer_id: &CustomerId,
new_data: CustomerData,
) -> () {
if !{
let read_guard = Self::get_customer_id_to_data_read_guard(customer_id_to_data);
if !read_guard.contains_key(customer_id) {
false
} else {
let mut data_write_guard = read_guard
.get(customer_id)
.unwrap() // just checked, still holding rlock
.write()
.unwrap_or_else(|| sys::process::exit(1));
*data_write_guard = new_index;
true
}
} {
let mut write_guard = Self::get_customer_id_to_data_write_guard(customer_id_to_data);
write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
}
}
This formulation satisfies the necessary conditions that
.get().unwrap()
new_data
can only be moved once, either in the if
clause of the if
body.However, the compiler complains about this last point: it does not see that if I have moved new_data
in the if clause, I will never reach the if
body.
I can't figure out how to re-organize this method to prevent that from happening.
As an attempt, the following naive code does NOT work, because the read guard in the if
clause is released before it is re-acquired in the if
body which may give rise to a race condition:
if Self::get_customer_id_to_data_read_guard(customer_id_to_data).contains_key(customer_id) {
let read_guard = Self::get_customer_id_to_data_read_guard(customer_id);
let mut data_write_guard = read_guard
.get(customer_id)
.unwrap() // just checked, still holding rlock
.write()
.unwrap_or_else(|| sys::process::exit(1));
*data_write_guard = new_data;
} else {
let mut write_guard = Self::get_customer_id_to_data_write_guard(customer_id_to_data);
write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
}
Another naive non-starter is like
if !get_read_lock().contains_key(...) {
get_write_lock().insert(...);
} else {
*get_read_lock().get_index_write_lock(...) = new_data;
}
which has the same failure mode of a race condition between checking the if
clause and entering the else
body.
I am aware of the if-let-chains RFC which would seem to be a way out, but don't see how to write a correct implementation without it.
I'll assume that this is your minimal reproducible example:
use std::{
collections::HashMap,
sync::{Arc, RwLock, RwLockReadGuard, RwLockWriteGuard},
};
#[derive(Eq, PartialEq, Hash, Clone, Copy)]
struct CustomerId;
struct CustomerData;
fn get_customer_id_to_data_read_guard(
s: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
) -> RwLockReadGuard<HashMap<CustomerId, Arc<RwLock<CustomerData>>>> {
s.read().unwrap()
}
fn get_customer_id_to_data_write_guard(
s: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
) -> RwLockWriteGuard<HashMap<CustomerId, Arc<RwLock<CustomerData>>>> {
s.write().unwrap()
}
fn load_customer_data_impl(
customer_id_to_data: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
customer_id: &CustomerId,
new_data: CustomerData,
) -> () {
if !{
let read_guard = get_customer_id_to_data_read_guard(customer_id_to_data);
if !read_guard.contains_key(customer_id) {
false
} else {
let mut data_write_guard = read_guard
.get(customer_id)
.unwrap() // just checked, still holding rlock
.write()
.unwrap_or_else(|_| std::process::exit(1));
*data_write_guard = new_data;
true
}
} {
let mut write_guard = get_customer_id_to_data_write_guard(customer_id_to_data);
write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
}
}
The solution is quite simple. The reason why the compiler gets confused is that it doesn't understand that false
/true
in your if
condition reflect different lifetimes of new_data
. Lifetimes are a compile time thing, and the content of a boolean variable is a runtime information.
The point where the compiler knows that new_data
is available is where you generate the false
value - simply put your code there :)
The only pitfall is that the read_guard
also still exists there, so aquiring a write_guard
there would cause a deadlock. But luckily the drop()
function exists to fix that:
fn load_customer_data_impl(
customer_id_to_data: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
customer_id: &CustomerId,
new_data: CustomerData,
) -> () {
let read_guard = get_customer_id_to_data_read_guard(customer_id_to_data);
if !read_guard.contains_key(customer_id) {
drop(read_guard);
let mut write_guard = get_customer_id_to_data_write_guard(customer_id_to_data);
write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
} else {
let mut data_write_guard = read_guard
.get(customer_id)
.unwrap() // just checked, still holding rlock
.write()
.unwrap_or_else(|_| std::process::exit(1));
*data_write_guard = new_data;
}
}
You can further improve your code by avoiding the .contains_key
function, which causes an unnecessary table lookup. Simply use .get
directly:
fn load_customer_data_impl(
customer_id_to_data: &Arc<RwLock<HashMap<CustomerId, Arc<RwLock<CustomerData>>>>>,
customer_id: &CustomerId,
new_data: CustomerData,
) -> () {
let read_guard = get_customer_id_to_data_read_guard(customer_id_to_data);
if let Some(existing_data) = read_guard.get(customer_id) {
let mut data_write_guard = existing_data
.write()
.unwrap_or_else(|_| std::process::exit(1));
*data_write_guard = new_data;
} else {
drop(read_guard);
let mut write_guard = get_customer_id_to_data_write_guard(customer_id_to_data);
write_guard.insert(*customer_id, Arc::new(RwLock::new(new_data)));
}
}
As a last remark: unwrap_or_else(|_| std::process::exit(1))
is kind of an antipattern. Even panic!()
or .unwrap()
still cause destructors of objects to run, which could close file handles or shutdown database connections. exit()
prevents all those things, so use it with care.