c++standardslanguage-designbuilt-in-typesstd-byte

Why were charN_t designed as built-in types, but std::byte was not?


Why were char8_t, char16_t, char32_t designed as built-in types, but std::byte was not?

As per the C++ philosophy, if something can be implemented in the library, we almost always prefer doing so to modifying the core language. So, char8_t, char16_t, char32_t seem they should also be defined as enum classes just as std::byte did.

Was there any special rationale behind the decision?


Solution

  • Reading the original proposal for char16_t and char32_t (N2249): They were based on the C proposal (WG14's N1040), which introduced char16_t and char32_t as type aliases for uint_least16_t and uint_least32_t.

    From the proposal on why they didn't use a typedef:

    Define new primitive types.

    Define char16_t to be a distinct new type, that has the same size and representation as uint_least16_t. Likewise, define char32_t to be a distinct new type, that has the same size and representation as uint_least32_t.

    [N1040 defined char16_t and char32_t as typedefs to uint_least16_t and uint_least32_t, which make overloading on these characters impossible.]

    [The experiments on open-source software indicate that these identifiers are not commonly used, and when used, used in a manner consistent with the proposal.]

    So they needed a new distinct type for char16_t/char32_t (for function overloads). This was a few months before nullptr was finalised, but theoretically they could have done something similar:

    // <cuchar>
    namespace std {
        using char16_t = decltype(u'\0');
        using char32_t = decltype(U'\0');
    }
    

    and not made them new keywords. But these new keywords were added before decltype and were not changed before C++11 was finalised.

    Regardless, it needed to be new primitive (fundamental) types for the crucial reason that there are literals with that type (both characters and arrays of characters). It might be possible with todays compilers (like how 0 <=> 0 needs you to #include <compare>), but in 2007 I can say for certain that it would be too much to require a compiler to check for an enum definition to use a character literal.

    Also std::byte explicitly does not allow arithmetic operators and has a lot more safeties, which a char16_t compatible with C's char16_t can't have (and you often do manipulate UTF16 and UTF32 code units with arithmetic)