clanguage-lawyerunionportabilitytype-punning

Are there any specific types or values for which type-punning produces identical behavior in all Standard-conforming C implementations?


Type-punning or reinterpreting the underlying bits from one type to another is notorious for having unpredictable and/or non-portable behavior.

For example:

union {
    unsigned u;
    float f;
} c = {.u = 10};
float f = c.f;

Not portable, that will depend on the representation of float.


union {
    unsigned char c[2];
    unsigned short s;
} c = {.c = {1, 2}};
short s = c.s;

Not portable, that will depend on the value of CHAR_BIT and the byte-order/endian of the system.


However, will any of the following have Standard-guaranteed/portable behavior, provided all the <stdint.h> types are defined:

union {
    uint8_t b[2];
    uint16_t w;
} c = {.b = {0x18, 0x18}};
assert(c.w == 0x1818);

Or the contrary:

union {
    uint8_t b[2];
    uint16_t w;
} c = {.w = 0x1818};
assert(c.b[0] == 0x18 && c.b[1] == 0x18);

Or if I extend the size of the types:

union {
    uint16_t w[2];
    uint32_t l;
} c = {.w = {0x1818, 0x1818}};
assert(c.l == 0x18181818);

In the above examples, the byte-order does not matter because the number is 'cyclic' and has the same representation in big/little-endian, or in any other esoteric byte-order for that matter. The types are guaranteed to be exactly their specified bits wide and have no trap representations or padding bits.

For those reasons there is no logical reason for the type-pun to have non-portable behavior or return any value other than those specified in the assert(), but does the C Standard make the same guarantee explicitly? Are those examples truly portable?

The C Standard states that reading an inactive union member will 'reinterpret' the bits to the new type but does that translate to the above examples having portable behavior? Or is there some way by some oddity some technically-conforming C99 implementation could compile but not produce the expected results?


Solution

  • Type-punning refers to reinterpreting a representation of a type as another type. If types are guaranteed to have the same or sufficiently well-defined representations, then type-punning may be portable.

    This is confirmed explicitly in 6.2.5 Footnote 39 (Emphasis mine):

    The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

    Integers

    For signed integer types [...] Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value.

    This means that any positive unsigned value of a type less than or equal to the maximum positive value of the corresponding signed type will have the same value when type-punned, and vice versa, since all the corresponding bits in the representation must have the same effect on the final value.

    This is guaranteed explicitly:

    A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value.

    Additionally:

    For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.

    Any value with all bits 0 has a value of 0. Therefore, any part of any all-0 integer type can be type-punned to any smaller integer, or multiple smaller integers with all bits 0 (including padding bits if any) may be type-punned to a larger one, and the value will still be 0.

    This partially addresses the examples in the question, as we know the fixed-width types have no padding bits, so those examples shall work with values of 0.

    Fixed-width integers (if defined)

    Type-punning intN_t to uintN_t for the same N will be equivalent to adding 2^(N-1) to the value if the intN_t value is negative. The reverse will be equivalent to subtracting 2^(N-1) from the value if the uintN_t value is greater than the maximum value of intN_t.

    The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes a signed integer type with a width of exactly 8 bits.

    This requirement guarantees that there are no padding bits and, since they have the same total number of bits, the number of value bits in the intN_t must be one less than the number of value bits in the uintN_t.

    there shall be exactly one sign bit.

    And since all 15 value bits in the intN_t must have the same values as the corresponding bits in the representation of uintN_t, and that two's complement is required for all fixed width types, by process of elimination the sign bit in intN_t must correspond to the value bit with value 2^N-1 in the uintN_t. Thus, type-punning between them must have portable behavior as specified above.

    Pointers

    In 6.2.5:

    A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.

    This implies one can safely type-pun between void * and char *, or between any two struct pointers, or any two union pointers, or between any two pointers to compatible (e.g., signed and unsigned versions of the same type) types. Although one can convert any object pointer type to void * or char *, doing so would require an explicit cast, not a type-pun.

    Structures

    Type-punning between structures and other structures or types is generally non-portable, due to the unspecified amount of padding inserted between structure members. However there are some exceptions:

    In 6.5.2.3:

    Type-punning is portable between a structure and the first member of the structure, or between a union and any of the members in the union, provided the behavior of type-punning the member with the last-stored member of the union has portable behavior.

    A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

    A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa.

    Additionally:

    One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

    This means if you have several separate structure types but all of their first members are of compatible types and in the same order, their matching members may be type-punned/accessed so long as at the scope of accessing, a union is fully declared and visible both. Example from the C Standard:

    The following is a valid fragment:

    union {
       struct {
           int alltypes;
       } n;
       struct {
           int type;
           int intnode;
       } ni;
       struct {
           int type;
           double doublenode;
       } nf;
    } u;
    u.nf.type = 1;
    u.nf.doublenode = 3.14;
    /* ... */
    if (u.n.alltypes == 1)
       if (sin(u.nf.doublenode) == 0.0)
           /* ... */
    

    The following is not a valid fragment (because the union type is not visible within function f):

    struct t1 { int m; };
    struct t2 { int m; };
    int f(struct t1 *p1, struct t2 *p2)
    {
       if (p1->m < 0)
       p2->m = -p2->m;
       return p1->m;
    }
    int g()
    {
       union {
           struct t1 s1;
           struct t2 s2;
       } u;
       /* ... */
       return f(&u.s1, &u.s2);
    }
    

    Union between arrays of smaller fixed-width types and larger fixed-width types

    Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

    This means that, provided there are no padding bits (which is the case for fixed-width types), type punning between two consecutive types to one twice as big will be guaranteed to have the effect of concatenating the bits of their object representations. Contiguous implies there can be no 'junk' between raw bytes in memory.

    For unsigned integer types [...] objects of that type shall be capable of representing values from 0 to 2^N − 1 using a pure binary representation;

    But does pure binary notation guarantee that the value bits in the object representation are ordered, increasingly, by magnitude?

    Pure binary notation is defined as:

    A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral powers of 2, except perhaps the bit with the highest position. (Adapted from the American National Dictionary for Information Processing Systems.) A byte contains CHAR_BIT bits, and the values of type unsigned char range from 0 to 2CHAR_BIT − 1.

    This explicitly mentions the representation, successive bits, and position. This implies that the bits in pure binary notation are ordered starting lowest to highest. If this were not the case, and the exact position of the bit within the representational would be meaningless, and the definition would not mention the position or that the bits are successive, that each of the value bits exist and correspond to each power of 2 between 0 and N. However, this definition specifies that the bits are successive and ordered.

    Why the requirement that bits signed integers have the same values as corresponding bits in unsigned values, if that would be redundant? Most likely, to make sure that the placement of the padding and/or sign bit does not 'offset' the value bits relative to a corresponding signed type.

    Given the above, concatenating identical copies of a fixed number of ordered bits into a new fixed number of ordered bits must produce the same value each time. A case could be made that any implementation that does not demonstrate the expected behavior in that case would violate the definition of pure binary notation.