I came across two code snippets
std::wstring str = std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>>().from_bytes("some utf8 string");
and,
std::wstring str = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes("some utf8 string");
Are they both correct ways to convert utf-8 stored in std::string
to utf-16 in std::wstring
?
codecvt_utf8_utf16
does exactly what it says: converts between UTF-8 and UTF-16, both of which are well-understood and portable encodings.
codecvt_utf8
converts between UTF-8 and UCS-2/4 (depending on the size of the given type). UCS-2 and UTF-16 are not the same thing.
So if your goal is to store genuine, actual UTF-16 in a wchar_t
, then you should use codecvt_utf8_utf16
. However, if you're trying to do cross-platform coding with wchar_t
as some kind of Unicode-ish thing or whatever, you can't. The UTF-16 facet always converts to UTF-16, whereas wchar_t
on non-Windows platforms is expected to generally be UTF-32/UCS-4. By contrast, codecvt_utf8
only converts to UCS-2/4, but on Windows, wchar_t
strings are "supposed" to be full UTF-16.
So you can't write code to satisfy all platforms without some #ifdef
or template work. On Windows, you should use codecvt_utf8_utf16
; on non-Windows, you should use codecvt_utf8
.
Or better yet, just use UTF-8 internally and find APIs that directly take strings in a specific format rather than platform-dependent wchar_t
stuff.