c++reinterpret-cast

Semantics of reinterpret_cast<const unsigned char*>


I've stumbled along the following code:

#include <bitset>
#include <iostream>

int main() {
  int x = 8;
  void *w = &x;
  bool val = *reinterpret_cast<const unsigned char*>(&x);
  bool *z = static_cast<bool *>(w);
  std::cout << "z (" << z << ") is " << *z << ": " << std::bitset<8>(*z) << "\n";
  std::cout << "val is " << val << ": " << std::bitset<8>(val) << "\n";
}

With -O3, this produced output:

z (0x7ffcaef0dba4) is 8: 00001000
val is 1: 00000001

However, with -O0, this produced output:

z (0x7ffe8c6c914c) is 0: 00000000
val is 1: 00000001

I know that dereferencing z invokes undefined behavior, and is why we are seeing inconsistent results. However, it seems dereferencing the reinterpret_cast into val is not invoking undefined behavior, and reliably produces {0,1} values.

Via (https://godbolt.org/z/f6s11Kr96), we see that gcc for x86 produces:

        lea     rax, [rbp-16]
        movzx   eax, BYTE PTR [rax]
        test    al, al
        setne   al
        mov     BYTE PTR [rbp-9], al

The effect of the test setne instructions is to convert non 0 values to 1 (and keep 0 values at 0). Is there some rule that states that reinterpret_casting from void * to const unsigned char * should have this behavior?


Solution

  • Accessing (i.e. reading) the value of z (not merely dereferencing itself) causes undefined behavior because it is an aliasing violation. (z points to an object of type int, but the access is through an lvalue of type bool)

    Access through a lvalue of type unsigned char is specifically exempt from being an aliasing violation. (see [basic.lval]/11.3)

    However, technically, it is still not specified what the result of accessing the int object through a unsigned char lvalue should be. The intent is that it gives the first byte of the object representation of the int object, but the standard currently is defective in not specifying that behavior. The paper P1839 attempts to resolve this defect.

    After reading this first byte from the object representation as a unsigned char value you convert it implicitly to bool when initializing bool val from it. The conversion from unsigned char to bool is a conversion of values, not reinterpretation of object representation. It is specified that a zero value is converted to false and anything else to true. (see [conv.bool])

    Whether you cast through void* explicitly or directly cast the int* to unsigned char* or bool* doesn't matter at all. reinterpret_cast between pointers is actually specified to be equivalent to static_cast<void*> followed by static_cast to the target pointer type. (In your code static_cast and reinterpret_cast are interchangeable.)