In Visual Studio /C++ declared a wstring c and filled it with a surrogate pair ( Unicode 0001F01C = Mahong tile )
std::cout << std::hex << 16;
std::cout << "Hello World!\n";
std::wstring c = L"\U0001F01C";
wchar_t* ctest = &c[0];
std::cout << "Checking value: " << *ctest << ".." << endl;
When I print out the value I would expect to get back this Unicode Number . But instead I get d83c.
Can anyone tell me why I don`t get the Unicode Value?
10Hello World!
Checking value: d83c..
You just need to do the reverse operation that creates a UTF-16 surrogate pair.
U+10000 to U+10FFFF
0x010000 is subtracted from the code point, leaving a 20-bit number in the range 0..0x0FFFFF.
The top ten bits (a number in the range 0..0x03FF) are added to 0xD800 to give the first 16-bit code unit or high surrogate, which will be in the range 0xD800..0xDBFF.
The low ten bits (also in the range 0..0x03FF) are added to 0xDC00 to give the second 16-bit code unit or low surrogate, which will be in the range 0xDC00..0xDFFF.
To reconstitute the surrogate pair into a Unicode code point, just do the opposite:
#include <cstdint>
#include <iostream>
#include <string>
int main() {
std::cout << std::hex << 16 << "\n";
std::cout << "Hello World!\n";
std::u16string c = u"\U0001F01C";
char16_t* ctest = &c[0];
std::cout << "Checking value: " << *ctest << ".." << "\n";
std::cout << "Checking value: " << ((static_cast<std::uint32_t>(ctest[0] & 0x03FF) << 10) | (ctest[1] & 0x03FF) | 0x10000U) << ".." << "\n";
}