In C11, a new string literal was added with the prefix u8
. This represents an array of chars with the text encoded as UTF-8. How is this even possible? Isn't a normal char signed? Meaning it has one bit less of information to use because of the sign bit? My logic would depict that a string of UTF-8 text would need to be an array of unsigned chars.
Isn't a normal char signed?
It's implementation-dependent whether char
is signed
or unsigned
.
Further, the sign bit isn't "lost", it can still be used to represent information, and char
is not necessarily 8 bits large (it might be larger on some platforms).