c c99 unspecified-behavior bit-representation

Unspecified behaviour about "object having more than one object representation"

Still struggling with C (C99) undefined and unspecified behaviours.

This time it is the following Unspecified Behaviour (Annex J.1):

The representation used when storing a value in an object that has more than one object representation for that value (6.2.6.1).

The corresponding section 6.2.6.1 states:

Where an operator is applied to a value that has more than one object representation, which object representation is used shall not affect the value of the result⁴³⁾. Where a value is stored in an object using a type that has more than one object representation for that value, it is unspecified which representation is used, but a trap representation shall not be generated.

with the following note 43:

It is possible for objects x and y with the same effective type T to have the same value when they are accessed as objects of type T, but to have different values in other contexts. In particular, if == is defined for type T, then x == y does not imply that memcmp(&x, &y, sizeof(T)) == 0. Furthermore, x == y does not necessarily imply that x and y have the same value; other operations on values of type T may distinguish between them.

I don't even understand what would be a value that has more than one object representation. Is it related for example to a floating point representation of 0 (negative and positive zero) ?

Solution

Most of this language is the C standard going well out of its way to allow for continued use on Burroughs B-series mainframes (AFAICT the only surviving ones-complement architecture). Unless you have to work with those, or certain uncommon microcontrollers, or you're seriously into retrocomputing, you can safely assume that the integer types have only one object representation per value, and that they have no padding bits. You can also safely assume that all integer types have no trap representations, except that you must take this line of J.2

[the behavior is undefined if ...] the value of an object ~~with automatic storage duration~~ is used while it is indeterminate

as if it were normative and as if the crossed-out words were not present. (This rule is not supported by a close reading of the actual normative text, but it is nonetheless the rule adopted by all of the current generation of optimizing compilers.)

Concrete examples of types that can have more than one object representation for a value on a modern, non-exotic implementation include:

_Bool: the effect of overwriting a _Bool object with the representation of an integer value other than an appropriately sized 0 or 1 is unspecified.
pointer types: some architectures ignore the low bits of a pointer to a type whose minimum alignment is greater than 1 (e.g. (int*)0x8000_0000 and (int*)0x8000_0001 might be treated as referring to the same int object; this is an intentional hardware feature, facilitating the use of tagged pointers)
floating point types: IEC 60559 allows all of the many representations of NaN to be treated identically (and possibly squashed together) by the hardware. (Note: +0 and −0 are distinct values in IEEE floating point, not different representations of the same value.)

These are also the scalar types that can have trap representations in modern implementations. In particular, Annex F specifically declares the behavior of signaling NaN to be undefined, even though it's well-defined in an abstract implementation of IEC 60559.