In the following Rust code, a Box
or Rc
containing a u32
is 8 bytes, while a Box
or Rc
containing a slice is 16 bytes. I understand why this is; a smart pointer pointing to a dynamically sized area of memory (like a slice) will include the length of that area of memory in addition to the address.
fn main()
{
println!("Size of Box<u32>: {} bytes", std::mem::size_of::<Box<u32>>()); // 8
println!("Size of Box<[u32]>: {} bytes", std::mem::size_of::<Box<[u32]>>()); // 16
println!("Size of Box<str>: {} bytes", std::mem::size_of::<Box<str>>()); // 16
println!("Size of Rc<u32>: {} bytes", std::mem::size_of::<Rc<u32>>()); // 8
println!("Size of Rc<[u32]>: {} bytes", std::mem::size_of::<Rc<[u32]>>()); // 16
println!("Size of Rc<str>: {} bytes", std::mem::size_of::<Rc<str>>()); // 16
}
My question is how Rust's compiler knows how big the pointer needs to be and how that additional length is actually used or interpreted.
For starters, here's my best guess at what's going on with the size of the pointer. Both Rc
and Box
have the following structures:
pub struct Box<T: ?Sized, A: Allocator = Global>(Unique<T>, A);
pub struct Rc<T: ?Sized, A: Allocator = Global> {
ptr: NonNull<RcInner<T>>,
phantom: PhantomData<RcInner<T>>,
alloc: A,
}
I'm assuming that somehow the Allocator
type is giving the Box
/Rc
an extra 8 bytes of space for the length, but I don't know how that is actually happening. I've looked through the allocator code but I don't understand what including the allocator here actually does. If I were to make my own pointer/container type, would including the Allocator
type in my struct be sufficient to have this behavior?
As far as how the length gets used, my understand is that both Box<T>
and Rc<T>
can be transparently coerced to &T
, and &T
is a primitive type that the compiler more or less expects to either be a pointer to a single unit of data, or a pointer and a length to an array of data, but in that case, the Deref
implementation confuses me.
impl<T: ?Sized, A: Allocator> Deref for Rc<T, A> {
type Target = T;
#[inline(always)]
fn deref(&self) -> &T {
&self.inner().value
}
}
I feel like I would expect this to somehow "lose" the information about the length, since that information is included with the Rc
and not the RcInner
or the value T
. But instead, the compiler magically knows that because Rc<[T]>
is itself a pointer, just like &[T]
, it knows that Rc
also has this somewhat hidden length following the address, though Rc
never includes it explicitly. I know that the length isn't stored with [T]
like you might expect, it's stored on the stack with the address of [T]
, whether that's through a Box
, an Rc
, or a plain reference.
So those are my two questions: How is the additional data for a fat pointer defined/established in the data structure itself, and how does Rust know that a type like Rc
or Box
is supposed to be not just a pointer, but a fat pointer with additional information for the length?
Here's some similar questions that have helped me understand how slices/pointers/fat pointers work as background for this:
Allocator
has nothing to do with it, it's a lot simpler than that, Box
and Rc
simply contain a primitive pointer themselves. It stores the extra metadata if it needs to be a fat pointer.
I.e. Rc<T>
where T: !Sized
is 16 bytes because it contains a NonNull<RcInner<T>>
which is 16 bytes in size. Where NonNull
is just a newtype of *const
.
Similarly Box<T>
contains a Unique<T>
which contains a NonNull<T>
.