I was surprised that Option<Vec<T>>
has the same size as Vec<T>
:
fn main() {
println!(
"u128: {} -> {}",
size_of::<u128>(),
size_of::<Option<u128>>()
);
println!(
"vec: {} -> {}",
size_of::<Vec<u16>>(),
size_of::<Option<Vec<u16>>>()
);
}
evaluating to
u128: 16 -> 32
vec: 24 -> 24
So
How does Rust represent the None
case for Option<Vec>
? It looks like a normal Vec
's 24 bytes come from 3 8-byte fields: a pointer, a byte capacity, and a length. My guess is that it's using a null pointer for None
, but I'm not certain. Would Rust still be able to maintain the same memory layout if Vec
held 2 pointers?
Why does Rust do this? For Option<u128>
we clearly have the choice of using N
bytes as long as N > 16
, and Rust chooses N = 32
. This of course has nice properties, e.g. we can access the i
th element of a Vec<Option<u128>>
with only bit shifts (no multiplications). But if that's ideal, shouldn't Rust make Option<Vec<T>>
32 bytes as well?
Rust version: rustc 1.83.0-nightly (1bc403daa 2024-10-11)
Option<Vec<_>>
You have discovered a niche value optimization guaranteed by Vec
. More specifically:
Vec<T>
contains...RawVec<T>
, which contains...RawVecInner
, which contains both...
Cap
, an usize
that can never exceed isize::MAX
(known to rustc
via rustc_layout_scalar_valid_range_end
)Unique<u8>
, which contains...NonNull<u8>
. The documentation of NonNull
specifically notes that the lack of NULL enables a memory layout optimization.As a result, Option::<Vec<_>>::None
is represented by 0 where the pointer would go (which is different than a valid instance of a NULL *const u8
pointer). Alternatively, it could be represented by a value exceeding isize::MAX
where Cap
would go. The other fields could be arbitrary for such a value. The compiler gets to choose which representation to use arbitrarily, but all zero bytes is be optimal for performance (easier to allocate).
I'm not sure what you mean by a Vec
that held two pointers. Adding an additional pointer to Vec
would increase the size of both Vec
and Option<Vec<_>>
by the size of a pointer.
Option<u128>
First, let's assume pointers are 8 bytes and the alignment of u128
is 16:
println!("{}", std::mem::size_of::<*const u8>()); // 8
println!("{}", std::mem::align_of::<u128>()); // 16
Consider the value vec![Some(0u128), Some(0u128)]
. For soundness, both u128
values must be aligned. By your logic, the first item in the Vec
must occupy at least 17 bytes (16 for the u128
, 1 for the enum
discriminant). The next aligned memory location (multiple of the alignment) would be 32 bytes from the beginning of the Vec
. Because Vec
doesn't have a way to add padding between items, the padding must fall within Option<u128>
, making it 32 bytes. In fact, size is a multiple of alignment for all types.
In case you're interested, the closest thing to a 16-byte Option<u128>
is an Option<NonZeroU128>
.
But if that's ideal, shouldn't Rust make Option<Vec> 32 bytes as well
As I mentioned, it's an alignment concern not an indexing performance concern. Vec
has an alignment of 8, so the size only needs to be a multiple of 8. If you were concerned with indexing performance, it's always possible to increase alignment:
/// Disclaimer: actual performance may vary;
/// consult your benchmark before making any optimization decision!
///
/// A 33% increase in memory usage will reduce cache locality!
#[repr(align(32))]
struct FastVec<T>(Vec<T>);