pointersrustslice

How does Rust know how big any of the smart pointer types are supposed to be?


In the following Rust code, a Box or Rc containing a u32 is 8 bytes, while a Box or Rc containing a slice is 16 bytes. I understand why this is; a smart pointer pointing to a dynamically sized area of memory (like a slice) will include the length of that area of memory in addition to the address.

fn main()
{
    println!("Size of Box<u32>: {} bytes", std::mem::size_of::<Box<u32>>()); // 8
    println!("Size of Box<[u32]>: {} bytes", std::mem::size_of::<Box<[u32]>>()); // 16
    println!("Size of Box<str>: {} bytes", std::mem::size_of::<Box<str>>()); // 16

    println!("Size of Rc<u32>: {} bytes", std::mem::size_of::<Rc<u32>>()); // 8 
    println!("Size of Rc<[u32]>: {} bytes", std::mem::size_of::<Rc<[u32]>>()); // 16
    println!("Size of Rc<str>: {} bytes", std::mem::size_of::<Rc<str>>()); // 16
}

My question is how Rust's compiler knows how big the pointer needs to be and how that additional length is actually used or interpreted.

For starters, here's my best guess at what's going on with the size of the pointer. Both Rc and Box have the following structures:

pub struct Box<T: ?Sized, A: Allocator = Global>(Unique<T>, A);
pub struct Rc<T: ?Sized, A: Allocator = Global> {
    ptr: NonNull<RcInner<T>>,
    phantom: PhantomData<RcInner<T>>,
    alloc: A,
}

I'm assuming that somehow the Allocator type is giving the Box/Rc an extra 8 bytes of space for the length, but I don't know how that is actually happening. I've looked through the allocator code but I don't understand what including the allocator here actually does. If I were to make my own pointer/container type, would including the Allocator type in my struct be sufficient to have this behavior?

As far as how the length gets used, my understand is that both Box<T> and Rc<T> can be transparently coerced to &T, and &T is a primitive type that the compiler more or less expects to either be a pointer to a single unit of data, or a pointer and a length to an array of data, but in that case, the Deref implementation confuses me.

impl<T: ?Sized, A: Allocator> Deref for Rc<T, A> {

    type Target = T;

    #[inline(always)]
    fn deref(&self) -> &T {
        &self.inner().value
    }
}

I feel like I would expect this to somehow "lose" the information about the length, since that information is included with the Rc and not the RcInner or the value T. But instead, the compiler magically knows that because Rc<[T]> is itself a pointer, just like &[T], it knows that Rc also has this somewhat hidden length following the address, though Rc never includes it explicitly. I know that the length isn't stored with [T] like you might expect, it's stored on the stack with the address of [T], whether that's through a Box, an Rc, or a plain reference.

So those are my two questions: How is the additional data for a fat pointer defined/established in the data structure itself, and how does Rust know that a type like Rc or Box is supposed to be not just a pointer, but a fat pointer with additional information for the length?

Here's some similar questions that have helped me understand how slices/pointers/fat pointers work as background for this:


Solution

  • Allocator has nothing to do with it, it's a lot simpler than that, Box and Rc simply contain a primitive pointer themselves. It stores the extra metadata if it needs to be a fat pointer.

    I.e. Rc<T> where T: !Sized is 16 bytes because it contains a NonNull<RcInner<T>> which is 16 bytes in size. Where NonNull is just a newtype of *const.

    Similarly Box<T> contains a Unique<T> which contains a NonNull<T>.