c++language-lawyerc++26erroneous-behavior

In C++26, are implementations required to "initialize" uninitialized variables to some fixed byte pattern?


In C++26, reading uninitialized variables is no longer undefined, it's "erroneous" now (What is erroneous behavior? How is it different from undefined behavior?).

However, the wording for this confuses me:

[basic.indet]/1.2

otherwise, the bytes have erroneous values, where each value is determined by the implementation independently of the state of the program.

(bold mine)

To me, this reads like the implementation must overwrite the values with something (e.g. 0xBEBEBEBE), because leaving them truly uninitialized might make them dependent on the "state of the program", contradicting the bold part.

Is my interpretation correct? Are implementations forced to overwrite uninitialized variables now?


Solution

  • The linked P2795R5 says under Performance and security implications:

    • The automatic storage for an automatic variable is always fully initialized, which has potential performance implications. P2723R1 discusses the costs in some detail. Note that this cost even applies when a class-type variable is constructed that has no padding and whose default constructor initializes all members.
    • In particular, unions are fully initialized. ...

    It also points out that although automatic locals can be annotated [[indeterminate]] to suppress this initialization, there's no way to avoid it for any temporaries.

    So it seems like your interpretation is correct.

    Oddly, it doesn't seem important what this magic value is - or even whether this initialization really happens - except that it can't be a trap pattern. As already pointed out there's no magic value of a byte that is unambiguously erroneous at runtime and still safe to load, copy, and compare.


    Edit - why do I say it doesn't seem to matter what the magic value is, or even whether this initialization really happens?

    1. The motivation is to stop evaluation (ie. glvalue-to-prvalue conversion) of uninitialized automatic variables being Undefined Behaviour. Instead it will be Erroneous Behaviour which implementations are encouraged to diagnose.

      • If an implementation doesn't diagnose the erroneous behaviour, the result of the evaluation is valid.
    2. The above can't be contingent on a specific bit pattern if that bit pattern could ever be produced by a valid expression, without the risk of misfiring diagnostics.

      • No usual primitives have such magic bit patterns, except for the now-uncommon trap representation.

      • eg. you couldn't use either quiet or signalling NaN to mark erroneous values, because if

        double fine = std::numeric_limits<double>::quiet_NaN;
        double errn;
        
        std::isnan(fine); // not erroneous
        std::isnan(errn); // erroneous behaviour
        

        needs to treat both values differently, it can't be based on the bit pattern.

      • The same is trivially true for integer types, and anyway [basic.indet/2] says

        Except in the following cases, ... if an erroneous value is produced by an evaluation, the behavior is erroneous and the result of the evaluation is the value so produced but is not erroneous

        where all the exclusions are related to "unsigned ordinary character type" and std::byte, so in:

        int errn;      // erroneous value
        foo(errn ^ 0); // 1, 2
        foo(errn);     // 3
        
        1. the XOR has erroneous behaviour, but if not diagnosed must produce a non-erroneous value with exactly the same bit-pattern as the erroneous input
        2. the call to foo with the non-erroneous value must not be diagnosed
        3. the call to foo with exactly the same bit-pattern may be diagnosed
    3. If the only goal is to prevent evaluation of uninitialized (automatic) variables escaping to UB, it's sufficient to require this kind of initialization only for types with trap representations.

      It may also be required to disable (or guard with diagnostic checks) some optimizations previously allowed by UB, but it's neither necessary nor sufficient for that to depend on a specific bit pattern.