rustenumsmemory-layout

How does Rust store enums in memory?


I'm new to Rust and I've been trying to understand how it stores enums in memory. I already know Rust implements tagged unions to represent enums. From what I've understood, this is what I should see in memory:

Consider the following piece of code:

enum MyEnum {
    A(u8, u8),
    B(u16),
    C(bool),
    D
}

fn main() {
    let v = vec![
        MyEnum::D,
        MyEnum::A(3, 2),
        MyEnum::B(10),
        MyEnum::C(true),
    ];
}

This is what I see inside actual memory:

03 00 00 00
00 03 02 FF
01 F0 0A 00
02 01 00 00

My explanation:

First row => TAG = 03 && VALUE = 3 null bytes

Second row => TAG = 00 && VALUE = (03, 02) && PADDING = 1 byte (I guess padding doesn't necessarily have to be a NULL byte)

Third row => TAG = 01 && PADDING = 1 byte && VALUE = 0A 00 (little-endian memory)

Fourth row => TAG = 02 && VALUE = 01 (true) && PADDING = 2 bytes

What I don't understand:

I don't quite understand the third row's layout: why does it have a padding byte right after the tag? Shouldn't it be at the end? It becomes even worse if I add a 32-bit field to the enum.


Second example with 32-bit field:

enum MyEnum {
    A(u8, u8),
    B(u16),
    C(bool),
    D,
    E(u32)
}

fn main() {
    let v = vec![
        MyEnum::D,
        MyEnum::A(3, 2),
        MyEnum::B(10),
        MyEnum::C(true),
        MyEnum::E(12949)
    ];
}

This is what I see inside actual memory:

03 00 00 00 00 00 00 00
00 03 02 00 00 00 00 00
01 FF 0A 00 FF FF FF FF
02 01 7F FF FF 7F 00 00
04 00 00 00 95 32 00 00

What I don't understand:

Why doesn't the 32-bit value (0x3295 = 12949) start from the end like the 16-bit value in the previous example? Why is there padding right after the tag (1 byte) and right after the number (2 bytes)?


Solution

  • In your last example, the value 12949 actually stands in the four last bytes: 95 32 00 00 in little endian (0x95 + 0x32 * 256)

    This a 4-bytes word, then it is aligned to a multiple of 4 address.

    The value 10 is stored in a 2-bytes word, then its value is aligned to a multiple of 2 address. If it was just after the tag, then the alignment of this field would not be 2.

    The whole enum is probably aligned to a large power of 2, in order to be certain of the alignment of the various fields it contains, just by adding the required padding.

    That's why the enum grows from 4 bytes to 8 bytes when you add the last field. If the whole enum is already aligned to a multiple of 4, and the first byte is used by the discriminant, then we need to skip 3 bytes in order to find the next multiple of 4.