c++unicodetype-conversionwstring

How to convert a unicode number to a std::wstring?


Is there an easy way to convert a Unicode number to a std::wstring? e.g. I want to convert U+1E9E (=16785054) to .


Solution

  • Depending on the platform which you are running your code on, the encoding of the std::wstring will need to be either UTF-16 (ie, Windows) or UTF-32 (ie, most other OSes). Converting a codepoint number to either of those formats is very trivial.

    On platforms where wchar_t is 32-bit in size, suitable for UTF-32, you can just cast the number as-is to wchar_t and then assign it to your wstring.

    On platforms where wchar_t is 16-bit in size, suitable for UTF-16, you will have to use a small bit of math to convert the number to 1 or 2 wchar_ts based on its value, and then assign that result to your wstring.

    For example:

    std::wstring CodePointToWString(unsigned int codepoint)
    {
        std::wstring str;
    
        if constexpr (sizeof(wchar_t) > 2) {
            // use UTF-32
            str = static_cast<wchar_t>(codepoint);
        }
        else {
            // use UTF-16
            if (codepoint <= 0xFFFF) {
                str = static_cast<wchar_t>(codepoint);
            }
            else {
                codepoint -= 0x10000;
                str.resize(2);
                str[0] = static_cast<wchar_t>(0xD800 + ((codepoint >> 10) & 0x3FF));
                str[1] = static_cast<wchar_t>(0xDC00 + (codepoint & 0x3FF));
            }
        }
    
        return str;
    }
    
    ...
    
    std::wstring str = CodePointToWString(0x1E9E);
    

    FYI, U+1E9E is not 16785054, it is 7838. 16785054 would be U+1001E9E instead, which is not a valid codepoint.