rustsolana

Under which scenarios is repr(C) useful aside from interoperating?


I've been using Rust for some time and almost never seen any struct/enum with #[repr(C)]. However, I see lots of #[repr(C)] in Solana Program Library (e.g., here is the spl-token program source code).

After reading the "Type layout" chapter (especially "The Rust Representation" section) of Rust Reference, I'm still confused about how exactly Rust makes data layout. The doc says:

The only data layout guarantees made by this representation are those required for soundness. They are:

  1. The fields are properly aligned.
  2. The fields do not overlap.
  3. The alignment of the type is at least the maximum alignment of its fields.

Seems that C representation also meets these guarantees. But C representation is described in details, while Rust representation isn't. So I don't quite get the differences between these 2 representations. Does Rust compiler do some complicated optimizations like modifying my original struct definition to make it space-efficient? Since the optimizations are too complicated, the doc simply omits the details?

My formal questions are:

  1. How exactly does Rust representation differ from C representation (to both structs and enums, and possibly other data composite structures)?
  2. Why will someone use #[repr(C)] when he/she is developing a "normal" Rust program? By "normal", I mean the program doesn't involve interoperation, and doesn't involve cross-compiling (to another target like Solana).
  3. Why does Solana Dev Team always add #[repr(C)] to the structs and enums?

Solution

  • How exactly does Rust representation differ from C representation (to both structs and enums, and possibly other data composite structures)?

    It differs mostly in that it's undefined, so the compiler team gives themselves the opportunity to change it at any moment. AFAIK currently what it does do is reorder fields in order to minimise padding (and thus total struct size), whereas repr(C) will lay memory in definition order (just adding padding where necessary).

    Why will someone use #[repr(C)] when he/she is developing a "normal" Rust program? By "normal", I mean the program doesn't involve interoperation, and doesn't involve cross-compiling (to another target like Solana).

    Precisely controlling position and padding is important to work around issues like false sharing, where data on the same cache line will conflict and lower performances, or in order to ensure alignment of specific members (e.g. you might need an i32 to be 16-byte aligned for some hardware or kernel reason).

    It's also relevant for things like dynamically linked libraries or zero-copy serialization, as you need the ability to synchronise the precise memory layout. repr(Rust) means you don't know it (in the sense that it's not guaranteed and in theory could change at any point). Though I guess those could count as interop, even if they don't cross FFI boundaries.

    Why does Solana Dev Team always add #[repr(C)] to the structs and enums?

    You'll have to ask them.