struct Node {
Node *left;
Node *right;
int height;
char data[];
};
This is how I used to define my data structure nodes, I find it very useful because I can embed the data directly in the node, otherwise I would have to use a void* that would imply another malloc and another layer of indirection. However, I read there are two issues with this kind of code:
Alignment
Let's say I have a function that copies some data to the data array with memcpy. If that data is a long long for example, then it would be misaligned. Would
_Alignas(max_align_t) char data[];
fix the problem?
Strict aliasing
The question is, as long as I access the data array with a pointer to a type that coincides with the type of the object from which I copied, is the strict aliasing rule obeyed, even if I theoretically access a char array through an int pointer for example? This is the only case I had to deal with type punning-like code.
If you do both of these things, i.e. ensure the the flexible array member is maximally aligned and only access that member via a pointer to the type the data was copied from, and allocate space dynamically, the behavior is well-defined.
Section 6.5p6 of the C standard describes what happens in this case:
The effective type of an object for an access to its stored value is the declared type of the object, if any. 98) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using
memcpy
ormemmove
, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access....
98) Allocated objects have no declared type.
The sections in bold are what's relevant here. We'll see exactly how this applies with the example below, assuming the alignment specifier is applied to the data
member:
long arr[3] = { 0x0102030405060708L, 0x0807060504030201L, -1L };
struct Node *node = malloc(sizeof(struct Node) + sizeof arr);
memcpy(node->data, arr, sizeof arr);
long *ptr = (long *)node->data;
printf("size=%zu\n", sizeof(struct Node));
for (int i=0; i<3; i++) {
printf("ptr[%i]=%lx\n", i, ptr[i]);
}
for (int i=0; i<3*sizeof(long); i++) {
printf("node->data[%i]=%hhx\n", i, node->data[i]);
}
First, space is allocated dynamically for a struct Node
plus 3 objects the size of a long
and assigned to node
. Immediately after allocation, the allocated bytes have no declared type as per the footnote 98.
After the call to memcpy
, the sizeof(long)*3
bytes starting at node->data
are an object with an effective type of long[3]
as per the bolded section, and the alignment specifier on the data
member ensures that this object is properly aligned.
The assignment to ptr
now results in this pointer pointing to the first element of the long[3]
object that was copied via memcpy
. The subsequent access to those objects by dereferencing ptr
via the array indexing syntax is then occurring through an lvalue matching the type of the objects, and is therefore valid.
Also, the second loop which accessed the copied array directly via the data
member is also valid, as it is always allowed to access bytes of an object via a character type. Note that this only applies for reads, as writes would change the effective type of the object.
Section 6.5p7 specifies how an object is allowed to be accessed, with the two bolded sections being relevant here:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.