rustmutability

How does interior mutability work for caching behavior?


I'm trying to create a struct that takes a Path and, on demand, loads the image from the path specified. Here's what I have so far:

extern crate image;

use std::cell::{RefCell};
use std::path::{Path};
use image::{DynamicImage};

pub struct ImageCell<'a> {
    image: RefCell<Option<DynamicImage>>,
    image_path: &'a Path, 
}

impl<'a> ImageCell<'a> {
    pub fn new<P: AsRef<Path>>(image_path: &'a P) -> ImageCell<'a>{
        ImageCell { image: RefCell::new(None), image_path: image_path.as_ref() }
    }

    //copied from https://doc.rust-lang.org/nightly/std/cell/index.html#implementation-details-of-logically-immutable-methods
    pub fn get_image(&self) -> &DynamicImage {
        {
            let mut cache = self.image.borrow_mut();
            if cache.is_some() {
                return cache.as_ref().unwrap(); //Error here
            }

            let image = image::open(self.image_path).unwrap();
            *cache = Some(image);
        }

        self.get_image()
    } 
}

This fails to compile:

src/image_generation.rs:34:24: 34:29 error: `cache` does not live long enough
src/image_generation.rs:34                 return cache.as_ref().unwrap();
                                                  ^~~~~
src/image_generation.rs:30:46: 42:6 note: reference must be valid for the anonymous lifetime #1 defined on the block at 30:45...
src/image_generation.rs:30     pub fn get_image(&self) -> &DynamicImage {
src/image_generation.rs:31         {
src/image_generation.rs:32             let mut cache = self.image.borrow_mut();
src/image_generation.rs:33             if cache.is_some() {
src/image_generation.rs:34                 return cache.as_ref().unwrap();
src/image_generation.rs:35             }
                           ...
src/image_generation.rs:32:53: 39:10 note: ...but borrowed value is only valid for the block suffix following statement 0 at 32:52
src/image_generation.rs:32             let mut cache = self.image.borrow_mut();
src/image_generation.rs:33             if cache.is_some() {
src/image_generation.rs:34                 return cache.as_ref().unwrap();
src/image_generation.rs:35             }
src/image_generation.rs:36 
src/image_generation.rs:37             let image = image::open(self.image_path).unwrap();
                           ...

I think I understand why because the lifetime of cache is tied to borrow_mut().

Is there anyway to structure the code so that this works?


Solution

  • I'm not totally convinced you need interior mutability here. However, I do think the solution you've proposed is generally useful, so I'll elaborate on one way to achieve it.

    The problem with your current code is that RefCell provides dynamic borrowing semantics. In other words, borrowing the contents of a RefCell is opaque to Rust's borrow checker. The problem is, when you try to return a &DynamicImage while it still lives inside the RefCell, it is impossible for the RefCell to track its borrowing status. If a RefCell allowed that to happen, then other code could overwrite the contents of the RefCell while there was a loan out of &DynamicImage. Whoops! Memory safety violation.

    For this reason, borrowing a value out of a RefCell is tied to the lifetime of the guard you get back when you call borrow_mut(). In this case, the lifetime of the guard is the stack frame of get_image, which no longer exists after the function returns. Therefore, you cannot borrow the contents of a RefCell like you're doing.

    An alternative approach (while maintaining the requirement of interior mutability) is to move values in and out of the RefCell. This enables you to retain cache semantics.

    The basic idea is to return a guard that contains the dynamic image along with a pointer back to the cell it originated from. Once you're done with the dynamic image, the guard will be dropped and we can add the image back to the cell's cache.

    To maintain ergonomics, we impl Deref on the guard so that you can mostly pretend like it is a DynamicImage. Here's the code with some comments and a few other things cleaned up:

    use std::cell::RefCell;
    use std::io;
    use std::mem;
    use std::ops::Deref;
    use std::path::{Path, PathBuf};
    
    struct ImageCell {
        image: RefCell<Option<DynamicImage>>,
        // Suffer the one time allocation into a `PathBuf` to avoid dealing
        // with the lifetime.
        image_path: PathBuf,
    }
    
    impl ImageCell {
        fn new<P: Into<PathBuf>>(image_path: P) -> ImageCell {
            ImageCell {
                image: RefCell::new(None),
                image_path: image_path.into(),
            }
        }
    
        fn get_image(&self) -> io::Result<DynamicImageGuard> {
            // `take` transfers ownership out from the `Option` inside the
            // `RefCell`. If there was no value there, then generate an image
            // and return it. Otherwise, move the value out of the `RefCell`
            // and return it.
            let image = match self.image.borrow_mut().take() {
                None => {
                    println!("Opening new image: {:?}", self.image_path);
                    try!(DynamicImage::open(&self.image_path))
                }
                Some(img) => {
                    println!("Retrieving image from cache: {:?}", self.image_path);
                    img
                }
            };
            // The guard provides the `DynamicImage` and a pointer back to
            // `ImageCell`. When it's dropped, the `DynamicImage` is added
            // back to the cache automatically.
            Ok(DynamicImageGuard { image_cell: self, image: image })
        }
    }
    
    struct DynamicImageGuard<'a> {
        image_cell: &'a ImageCell,
        image: DynamicImage,
    }
    
    impl<'a> Drop for DynamicImageGuard<'a> {
        fn drop(&mut self) {
            // When a `DynamicImageGuard` goes out of scope, this method is
            // called. We move the `DynamicImage` out of its current location
            // and put it back into the `RefCell` cache.
            println!("Adding image to cache: {:?}", self.image_cell.image_path);
            let image = mem::replace(&mut self.image, DynamicImage::empty());
            *self.image_cell.image.borrow_mut() = Some(image);
        }
    }
    
    impl<'a> Deref for DynamicImageGuard<'a> {
        type Target = DynamicImage;
    
        fn deref(&self) -> &DynamicImage {
            // This increases the ergnomics of a `DynamicImageGuard`. Because
            // of this impl, most uses of `DynamicImageGuard` can be as if
            // it were just a `&DynamicImage`.
            &self.image
        }
    }
    
    // A dummy image type.
    struct DynamicImage {
        data: Vec<u8>,
    }
    
    // Dummy image methods.
    impl DynamicImage {
        fn open<P: AsRef<Path>>(_p: P) -> io::Result<DynamicImage> {
            // Open image on file system here.
            Ok(DynamicImage { data: vec![] })
        }
    
        fn empty() -> DynamicImage {
            DynamicImage { data: vec![] }
        }
    }
    
    fn main() {
        let cell = ImageCell::new("foo");
        {
            let img = cell.get_image().unwrap(); // opens new image
            println!("image data: {:?}", img.data);
        } // adds image to cache (on drop of `img`)
        let img = cell.get_image().unwrap(); // retrieves image from cache
        println!("image data: {:?}", img.data);
    } // adds image back to cache (on drop of `img`)
    

    There is a really important caveat to note here: This only has one cache location, which means if you call get_image a second time before the first guard has been dropped, then a new image will be generated from scratch since the cell will be empty. This semantic is hard to change (in safe code) because you've committed to a solution that uses interior mutability. Generally speaking, the whole point of interior mutability is to mutate something without the caller being able to observe it. Indeed, that should be the case here, assuming that opening an image always returns precisely the same data.

    This approach can be generalized to be thread safe (by using Mutex for interior mutability instead of RefCell) and possibly more performant by choosing a different caching strategy depending on your use case. For example, the regex crate uses a simple memory pool to cache compiled regex state. Since this caching should be opaque to callers, it is implemented with interior mutability using precisely the same mechanism outlined here.