So there are several things that are clearly allowed under the strict aliasing rules (for clarity, lets do this in C23):
The first and most obvious is that structs are allowed to alias with pointers to their initial members:
typedef struct {
int data;
} parent;
typedef struct {
parent _base;
int metadata;
} child;
int main() {
child child_obj = {};
parent* parent_ptr = (parent*) &child_obj; // Fine to read and write
int* int_ptr = (int*) parent_ptr; // Also fine
}
The second is that unions of objects with common initial sequences allow access with those common elements, and a pointer to a union is freely convertible to/accessible as a pointer to any of its elements.
typedef struct {
int type_id;
size_t size;
char* buffer;
} dynamic_string;
typedef struct {
int type_id;
size_t number;
} just_a_number;
typedef union {
dynamic_string dystr;
just_a_number num;
struct {
int type_id;
char fixed_string[32];
};
} string_or_num;
int main() {
string_or_num obj = {};
if(obj.type_id == 2) // Fine, common initial sequence
memcpy("Hello World!", obj.fixed_string, 13);
// Fine
dynamic_string* ds_ptr = (dynamic_string*) &obj;
just_a_number* num_ptr = (just_a_number*) &obj;
// Also fine, pointer to initial common member
int* int_ptr = (int*) &obj;
}
Intuitively, I think I can combine these into something like the following. However, I'm not confident enough in my standardese to say with 100% certainty it is kosher
typedef struct {
int type_id;
char data[4];
} parent_a;
typedef struct {
int type_id;
float decimal;
} parent_b;
// No initial sequence
typedef struct {
double ccccombo_breaker;
} parent_c;
typedef struct {
union {
parent_a _base_a;
parent_b _base_b;
parent_c _base_c;
};
int look_at_me_ive_got_three_parents;
} child;
int main() {
child child_obj = {};
// Are these kosher?
parent_a* a_ptr = (parent_a*) &child_obj;
parent_b* b_ptr = (parent_b*) &child_obj;
// How about this?
parent_c* c_ptr = (parent_c*) &child_obj;
double* db_ptr = (double*) &child_obj;
}
To be clear I'm not asking if something like parent_c
is a good idea, just what the standard says about it. Would reads and writes through these pointers be following the aliasing rules?
Bonus points if you have exact language from the standard or a combination of standard sections that make a compelling case.
These are separate but mostly compatible rules:
Pointer to first member of struct/any member of union (6.7.2.1 §15 §16), particularly:
The size of a
union
is sufficient to contain the largest of its members. The value of at most one of the members can be stored in aunion
object at any time. A pointer to aunion
object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa.
Inspect byte by byte of any type using pointer to character (6.3.2.3 §7)
"Strict aliasing" (chapter 6.5 §6 and §7)
Common initial sequence (6.5.2.3)
Additionally, there's the rule about "union type punning" (6.5.2.3 §3) which allows a member of a union to be converted/expressed as a different type, although this may invoke all manner of poorly-defined behavior in case of misalignment, out of range values, invalid/trap representations and so on.
Your question is mainly about the first of these rules:
parent_a* a_ptr = (parent_a*) &child_obj;
parent_b* b_ptr = (parent_b*) &child_obj;
parent_c* c_ptr = (parent_c*) &child_obj;
These are fine as per that "any member of union" rule. An anonymous struct/union means that any of _base_a
, _base_b
and _base_c
is "any member of a union" and therefore the pointer type is "suitably converted" by the cast. That these types happen to have a common initial sequence further down isn't really relevant. We can have a union of wildly incompatible types. Potential problems can only arise when accessing the actual data through a potentially non-compatible type or a type which is not an alias.
double* db_ptr = (double*) &child_obj;
is however a bit questionable since the first object of child_obj
is not a double
. The pointer conversion itself is almost always fine, C allows pretty much any crazy conversion between object pointers (6.3.2.3 §7).
But if you de-reference db_ptr
later, then you are on more questionable territory - the "pointer to any member of union" rule doesn't apply so it becomes a question of strict aliasing. Which in turn doesn't object of doing a double
lvalue access to something that is potentially a double
. And if the binary contents stored there (all zeroes) can also be represented as a double
, then everything is in theory fine.
Notably, the history of real-world compiler implementations of these rules isn't very pretty (particularly not strict aliasing and common initial sequence). Lots of things are left unclear by the standard and it is better not to rely on whatever the standard says/seems to say, because that's not necessarily how one particular compiler interprets it. Plus some compilers do not even have the ambition to become a quality implementation. It is best practice not to trust the compiler to get any of this right. For example, the latest gcc 13.2 goes completely bananas when facing the mentioned "inspect byte by byte" rule which has been in C since at least C99.