arraysclanguage-lawyerunionundefined-behavior

Casting a pointer to a union to a pointer to the element type of an array member of the union


I have a union:

union Obj64 {
    uint64_t u64;
    uint32_t u32[2];
    uint16_t u16[4];
    uint8_t u8[8];
};
static_assert(sizeof(union Obj64) == sizeof(uint64_t), "Unexpected trailing padding in union!");

Now I have a pointer to the union:

union Obj64 *union_ptr = SOME_VALUE;

I want a uint16_t * pointer to the same data. I could access the u16 member and use that as a uint16_t * due to pointer decay:

uint16_t *u16_ptr = union_ptr->u16;

But would casting the union to a uint16_t * directly be valid?

uint16_t *u16_ptr = (uint16_t *)union_ptr;

I know that:

A pointer to a union object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa.

But the union does not contain any isolated uint16_t members but only a uint16_t array. So does the above rule apply?

  1. Is the above conversion valid?

  2. Are there any differences in terms of (un)defined behavior when using the pointer if the above conversion is valid in the first place?

    For example I might have a buffer of multiple union Obj64 objects. Could I use ((uint16_t *)union_ptr)[4] to access a uint16_t inside the next union in the buffer even though using union_ptr->u16 to do so would not be valid?


Solution

  • The C standard does not formally define the supported conversions, so trying to language-lawyer these is to some extent futile. C 2018 6.7.2.1 16 says “… A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa,” but nothing in the standard defines “suitably converted.”

    (uint16_t *)union_ptr produces a pointer to a uint16_t. None of the members of Obj64 is a uint16_t, so this does not point to any of its members, so 6.7.2.1 16 does not apply even if “suitably converted” is defined. The u16 member is an array, so (uint16_t (*)[4]) union_ptr is almost certainly a “suitably converted” pointer to that member.

    Even if you consider (uint16_t *)union_ptr to be first a conversion to the array and then a conversion to a pointer to its first member, the C standard does not actually define the latter conversion. Given int x[4];, (int *) &x is a partially defined conversion (it produces a pointer that can be converted back to int (*) to produce a pointer to the array), but it is not defined to necessarily point to x[0]. (It does in plenty of normal C implementations, but this is a language-lawyer question.)

    You can produce a pointer to the first element of the u16 member by using * (uint16_t (*)[4]) union_ptr designate the array and allowing array-to-pointer conversion to occur, or you could take it explicitly with &((uint16_t (*)[4]) union_ptr)[0]. Or simply use union_ptr->u16 with pointer-to-array conversion.

    In any case, once you get a pointer to union_ptr->u16[0] by any method, you can use it to access union_ptr->u16[i] for 0 ≤ i < 4. Using it to access the memory beyond that is not safe without some additional prerequisite.

    … would ((uint16_t *)union_ptr)[x] be safe if x > 3 and union_ptr was a pointer to an element in a Obj64 array and there was an element after?

    No. Why would it be? ((uint16_t *)union_ptr)[x] includes arithmetic on, at best, a pointer to an element in an array, and that array has elements indexed from 0 to 3, and arithmetic on that pointer is not defined if the index it would refer to goes beyond 4. It is defined for 4, but dereferencing the pointer is not. The fact that Obj64 contains an array of 4 uint16_t does not mean that an array of, say, 2 Obj64 contains an array of 8 uint16_t.