I have come up with this code to intern strings and manipulate &'static
reference to them.
I believe it is sound, but I'd like to have confirmation of this.
My rationale is as follow:
HashSet
is statically allocated, and does not cause reallocation of its values (its internal storage might get reallocated, but the String
themselves should not).Mutex
.String
are never removed from the HashSet
, transmute
'ing their content to &'static
should be safe.use std::{collections::HashSet, sync::Mutex};
lazy_static! {
static ref LOCKED_HASH: Mutex<HashSet<String>> = Mutex::new(HashSet::new());
}
pub fn intern_string(input: &str) -> &'static str {
let mut lock = LOCKED_HASH.lock().unwrap();
if let Some(val) = lock.get(input) {
unsafe { std::mem::transmute::<&str, &'static str>(val) }
} else {
lock.insert(input.to_string());
let interned = lock.get(input).unwrap();
unsafe { std::mem::transmute::<&str, &'static str>(interned) }
}
}
Is this really safe?
May I suggest the following instead using String::leak
which doesn't need unsafe code at all?
use hashbrown::HashSet;
use std::sync::{LazyLock, RwLock};
pub fn intern_string(input: &str) -> &'static str {
static SET: LazyLock<RwLock<HashSet<&'static str>>> =
LazyLock::new(RwLock::default);
if let Some(r) = SET.read().unwrap().get(input) {
return r;
}
SET.write()
.unwrap()
.get_or_insert_with(input, |s| String::from(s).leak())
}
Using RwLock
instead of Mutex
you also reduce (but not eliminate) contention between different threads. And lazy_static
is not necessary anymore with LazyLock
in the standard library.