I'm simply trying stringstream
in UTF-8:
#include<iostream>
#include<string>
#include<sstream>
int main()
{
std::basic_stringstream<char8_t> ss(u8"hello");
char8_t c;
std::cout << (ss.rdstate() & std::ios_base::goodbit) << " " << (ss.rdstate() & std::ios_base::badbit) << " "
<< (ss.rdstate() & std::ios_base::failbit) << " " << (ss.rdstate() & std::ios_base::eofbit) << "\n";
ss >> c;
std::cout << (ss.rdstate() & std::ios_base::goodbit) << " " << (ss.rdstate() & std::ios_base::badbit) << " "
<< (ss.rdstate() & std::ios_base::failbit) << " " << (ss.rdstate() & std::ios_base::eofbit) << "\n";
std::cout << c;
return 0;
}
Compile using:
g++-9 -std=c++2a -g -o bin/test test/test.cpp
The result on screen is:
0 0 0 0
0 1 4 0
0
It seems that something goes wrong when reading c
, but I don't know how to correct it. Please help me!
This is actually an old issue not specific to support for char8_t
. The same issue occurs with char16_t
or char32_t
in C++11 and newer. The following gcc bug report has a similar test case.
The issue is also discussed at the following:
The issue is that gcc does not implicitly imbue the global locale with facets for ctype<char8_t>
, ctype<char16_t>
, or ctype<char32_t>
. When attempting to perform an operation that requires one of these facets, a std::bad_cast
exception is thrown from std::__check_facet
(which is subsequently silently swallowed by the IOS sentry object created for the character extraction operator and which then sets badbit
and failbit
).
The C++ standard only requires that ctype<char>
and ctype<wchar_t>
be provided. See [locale.category]p2.