I have seen the claim made that, in standard C, reading from an uninitialized object (specifically, an object with indeterminate representation) invokes undefined behavior. This also seems to be how compilers will treat an uninitialized read in practice. For example, the following C program can output 0 0
using GCC 15.1:
#include <stdio.h>
unsigned char calculate(unsigned char *p)
{
if (!p) return 0;
unsigned char sum;
printf("%d ", (int)sum);
if (!sum) sum += *p;
return sum;
}
int main() {
unsigned char one = 1;
printf("%d\n", calculate(&one));
return 0;
}
(Godbolt: https://godbolt.org/z/vKEoGadM9)
It is also willing to produce the same result with ,s/unsigned char/int/g
, and also with ,s/unsigned char/signed char/g
.
If GCC didn't treat an uninitialized read as UB, I would expect the program to print 0 1
or the same nonzero integer twice.
However, I haven't found proof in the standard (I am using n3220 as a reference) that an uninitialized read of an integer with no padding bits (such as unsigned char
, and I would also expect int
not to have padding bits on most platforms) should be UB.
It may be worth noting that it is the eleventh item of the informative-only J.2 list.
I believe the relevant undefined behavior is defined in 6.2.6.1p5:
Certain object representations do not represent a value of the object type. If such a representation is read by an lvalue expression that does not have character type, the behavior is undefined. [...] Such a representation is called a non-value representation.
The connection to uninitialized automatic variables is found in 6.7.11p11:
If an object that has automatic storage duration is not initialized explicitly, its representation is indeterminate. [...]
and indeterminate representation is defined in 3.23:
object representation that either represents an unspecified value or is a non-value representation
However, I wouldn't expect any integer type that contains no padding bits to have any non-value representations (thus rendering the UB impossible). Both unsigned char
and signed char
are explicitly called out in 6.2.6.2 as having no padding bits, for example. I would also not expect int
or any other non-bit-precise non-bit-field integer to have any padding bits on x86-64 GCC.
Indeed, GCC treats int
as if the total number of bits in an int
(which must be sizeof(int)*CHAR_BIT
per 6.2.6.1p4) equals the number of non-padding bits (which must be INT_WIDTH
per 6.2.6.2p2 and 5.2.5.3.2): https://godbolt.org/z/EfE5dacWK.
My questions are:
unsigned char
such that reading a byte of the object through an lvalue of type unsigned char
is UB?If GCC didn't treat an uninitialized read as UB, I would expect the program to print
0 1
or the same nonzero integer twice.
This expectation is incorrect because an unspecified value may vary from use to use. dbush’s answer addresses undefined behavior, but let’s suppose we eliminate that by inserting (void) ∑
in your program, thus taking its address and eliminating the effect of that statement about “could have been been declared with the register
storage class” (C 2024 6.3.3.1). Then sum
has an indeterminate representation, but using it does not have undefined behavior.
You are presuming that because printf("%d ", (int)sum);
prints 0
, then the value of sum
must be 0. Indeed, its value was taken as 0 for that statement. However, the definition of “indeterminate representation” in C 2024 3.23 is “object representation that either represents an unspecified value or is a non-value representation,” and the definition of “unspecified value” in C 2024 3.22.2 is, with emphasis added, “valid value of the relevant type where this document imposes no requirements on which value is chosen in any instance.”
Thus the implementation is free to take sum
as having the value 0 in the instance printf("%d ", (int)sum);
but as having the value 1 in the !sum
of if (!sum) sum += *p;
and having the value 0 in the sum += *p
part.