When you are going from std::u16string
to, lets say std::u32string
, std::wstring_convert
doesn't work as it expects chars
. So how does one use std::wstring_convert
to convert between UTF-16 and UTF-32 using std::u16string
as input?
For example :
inline std::u32string utf16_to_utf32(const std::u16string& s) {
std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
return conv.from_bytes(s); // cannot do this, expects 'char'
}
Is it ok to reinterpret_cast
to char
, as I've seen in a few examples?
If you do need to reinterpret_cast
, I've seen some examples using the string size as opposed to the total byte size for the pointers. Is that an error or a requirement?
I know codecvt
is deprecated, but until the standard offers an alternative, it has to do.
If you do not want to reinterpret_cast
, the only way I've found is to first convert to utf-8, then reconvert to utf-32.
For ex,
// Convert to utf-8.
std::u16string s;
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
std::string utf8_str = conv.to_bytes(s);
// Convert to utf-32.
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
std::u32string utf32_str = conv.from_bytes(utf8_str);
Yes this is sad and likely contributes to codecvt deprecation.