rustunsafestring-interning

Is this string interner logic sound?


I have come up with this code to intern strings and manipulate &'static reference to them.

I believe it is sound, but I'd like to have confirmation of this.

My rationale is as follow:

use std::{collections::HashSet, sync::Mutex};

lazy_static! {
    static ref LOCKED_HASH: Mutex<HashSet<String>> = Mutex::new(HashSet::new());
}

pub fn intern_string(input: &str) -> &'static str {
    let mut lock = LOCKED_HASH.lock().unwrap();
    if let Some(val) = lock.get(input) {
        unsafe { std::mem::transmute::<&str, &'static str>(val) }
    } else {
        lock.insert(input.to_string());
        let interned = lock.get(input).unwrap();
        unsafe { std::mem::transmute::<&str, &'static str>(interned) }
    }
}

Is this really safe?


Solution

  • May I suggest the following instead using String::leak which doesn't need unsafe code at all?

    use hashbrown::HashSet;
    use std::sync::{LazyLock, RwLock};
    
    pub fn intern_string(input: &str) -> &'static str {
        static SET: LazyLock<RwLock<HashSet<&'static str>>> =
            LazyLock::new(RwLock::default);
        if let Some(r) = SET.read().unwrap().get(input) {
            return r;
        }
        SET.write()
            .unwrap()
            .get_or_insert_with(input, |s| String::from(s).leak())
    }
    

    Using RwLock instead of Mutex you also reduce (but not eliminate) contention between different threads. And lazy_static is not necessary anymore with LazyLock in the standard library.