c++castingstandards

What is the portable way to cast to-and-from a `char`, preserving the same bit pattern?


Many methods take char to be the 'byte type' 1. However, it is unclear (to me) whether it is implementation defined behaviour what a cast from an unsigned char to a char ought to do when the unsigned char's value is greater than CHAR_MAX.

This section is from C++11 (ISO/IEC 14882:2011) §4.7 Integral conversions [conv.integral/2]:

If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.

This section of the standard is also mentioned in a similar question, but no mention is given to how we can preserve the bit-pattern for these conversions.

So, how should I cast a uint8_t, unsigned char, or other char-width type to-and-from a char, preserving the bit pattern?

Note: This question is about C++ in general, not a specific version. If something was unacceptable in C++11, but works in C++14, then it would be useful if the answer could contrast the two versions.

1: for example, x64 SIMD intrinsics such as __m128i _mm_set1_epi8(char) use char.


Solution

  • From C++20, you can use std::bit_cast. This guarantees that you get a bit-for-bit identical value for types with no padding bits, which char and unsigned char are.

    Prior to that, you will have to investigate the documentation of each implementation that you intend to use. The implementations1 I am aware of all choose to not modify the bit patterns to arrive at their implementation-defined values.

    The wording in the standard is there because the bit patterns that correspond to negative numbers represent different numbers under two's complement, one's complement and sign and magnitude.

    Failing that, you can std::memcpy an lvalue unsigned char into an lvalue char

    1. Even when the platform isn't two's complement, it doesn't make sense to change the bit pattern when casting signed <-> unsigned. You might end up with -0, and you end up with a different value to what a two's complement machine would give, but unless the implementation is going out of it's way to be obtuse, it will have the same bit pattern.