I'm new to Rust and I've been trying to understand how it stores enums in memory. I already know Rust implements tagged unions to represent enums. From what I've understood, this is what I should see in memory:
Consider the following piece of code:
enum MyEnum {
A(u8, u8),
B(u16),
C(bool),
D
}
fn main() {
let v = vec![
MyEnum::D,
MyEnum::A(3, 2),
MyEnum::B(10),
MyEnum::C(true),
];
}
This is what I see inside actual memory:
03 00 00 00
00 03 02 FF
01 F0 0A 00
02 01 00 00
My explanation:
First row => TAG = 03 && VALUE = 3 null bytes
Second row => TAG = 00 && VALUE = (03, 02) && PADDING = 1 byte (I guess padding doesn't necessarily have to be a NULL byte)
Third row => TAG = 01 && PADDING = 1 byte && VALUE = 0A 00 (little-endian memory)
Fourth row => TAG = 02 && VALUE = 01 (true) && PADDING = 2 bytes
What I don't understand:
I don't quite understand the third row's layout: why does it have a padding byte right after the tag? Shouldn't it be at the end? It becomes even worse if I add a 32-bit field to the enum.
Second example with 32-bit field:
enum MyEnum {
A(u8, u8),
B(u16),
C(bool),
D,
E(u32)
}
fn main() {
let v = vec![
MyEnum::D,
MyEnum::A(3, 2),
MyEnum::B(10),
MyEnum::C(true),
MyEnum::E(12949)
];
}
This is what I see inside actual memory:
03 00 00 00 00 00 00 00
00 03 02 00 00 00 00 00
01 FF 0A 00 FF FF FF FF
02 01 7F FF FF 7F 00 00
04 00 00 00 95 32 00 00
What I don't understand:
Why doesn't the 32-bit value (0x3295 = 12949) start from the end like the 16-bit value in the previous example? Why is there padding right after the tag (1 byte) and right after the number (2 bytes)?
In your last example, the value 12949 actually stands in the four last bytes: 95 32 00 00 in little endian (0x95 + 0x32 * 256)
This a 4-bytes word, then it is aligned to a multiple of 4 address.
The value 10 is stored in a 2-bytes word, then its value is aligned to a multiple of 2 address. If it was just after the tag, then the alignment of this field would not be 2.
The whole enum is probably aligned to a large power of 2, in order to be certain of the alignment of the various fields it contains, just by adding the required padding.
That's why the enum grows from 4 bytes to 8 bytes when you add the last field. If the whole enum is already aligned to a multiple of 4, and the first byte is used by the discriminant, then we need to skip 3 bytes in order to find the next multiple of 4.