Is there an easy way to convert a Unicode number to a std::wstring
?
e.g. I want to convert U+1E9E (=16785054) to ẞ
.
Depending on the platform which you are running your code on, the encoding of the std::wstring
will need to be either UTF-16 (ie, Windows) or UTF-32 (ie, most other OSes). Converting a codepoint number to either of those formats is very trivial.
On platforms where wchar_t
is 32-bit in size, suitable for UTF-32, you can just cast the number as-is to wchar_t
and then assign it to your wstring
.
On platforms where wchar_t
is 16-bit in size, suitable for UTF-16, you will have to use a small bit of math to convert the number to 1 or 2 wchar_t
s based on its value, and then assign that result to your wstring
.
For example:
std::wstring CodePointToWString(unsigned int codepoint)
{
std::wstring str;
if constexpr (sizeof(wchar_t) > 2) {
// use UTF-32
str = static_cast<wchar_t>(codepoint);
}
else {
// use UTF-16
if (codepoint <= 0xFFFF) {
str = static_cast<wchar_t>(codepoint);
}
else {
codepoint -= 0x10000;
str.resize(2);
str[0] = static_cast<wchar_t>(0xD800 + ((codepoint >> 10) & 0x3FF));
str[1] = static_cast<wchar_t>(0xDC00 + (codepoint & 0x3FF));
}
}
return str;
}
...
std::wstring str = CodePointToWString(0x1E9E);
FYI, U+1E9E is not 16785054, it is 7838. 16785054 would be U+1001E9E instead, which is not a valid codepoint.