Why were char8_t
, char16_t
, char32_t
designed as built-in types, but std::byte
was not?
As per the C++ philosophy, if something can be implemented in the library, we almost always prefer doing so to modifying the core language. So, char8_t
, char16_t
, char32_t
seem they should also be defined as enum classes just as std::byte
did.
Was there any special rationale behind the decision?
Reading the original proposal for char16_t
and char32_t
(N2249): They were based on the C proposal (WG14's N1040), which introduced char16_t
and char32_t
as type aliases for uint_least16_t
and uint_least32_t
.
From the proposal on why they didn't use a typedef:
Define new primitive types.
Define
char16_t
to be a distinct new type, that has the same size and representation asuint_least16_t
. Likewise, definechar32_t
to be a distinct new type, that has the same size and representation asuint_least32_t
.[N1040 defined
char16_t
andchar32_t
as typedefs touint_least16_t
anduint_least32_t
, which make overloading on these characters impossible.][The experiments on open-source software indicate that these identifiers are not commonly used, and when used, used in a manner consistent with the proposal.]
So they needed a new distinct type for char16_t
/char32_t
(for function overloads). This was a few months before nullptr
was finalised, but theoretically they could have done something similar:
// <cuchar>
namespace std {
using char16_t = decltype(u'\0');
using char32_t = decltype(U'\0');
}
and not made them new keywords. But these new keywords were added before decltype
and were not changed before C++11 was finalised.
Regardless, it needed to be new primitive (fundamental) types for the crucial reason that there are literals with that type (both characters and arrays of characters). It might be possible with todays compilers (like how 0 <=> 0
needs you to #include <compare>
), but in 2007 I can say for certain that it would be too much to require a compiler to check for an enum definition to use a character literal.
Also std::byte
explicitly does not allow arithmetic operators and has a lot more safeties, which a char16_t
compatible with C's char16_t
can't have (and you often do manipulate UTF16 and UTF32 code units with arithmetic)