[SOLVED] How to convert a codepoint to utf-8?

How to convert a codepoint to utf-8?

I have some code that reads in an a unicode codepoint (as escaped in a string 0xF00).

Since im using boost, I'm speculating if the following is best (and correct) approach:

unsigned int codepoint{0xF00};
boost::locale::conv::utf_to_utf<char>(&codepoint, &codepoint+1);

Solution

As mentioned, a codepoint in this form is (conveniently) UTF-32, so what you're looking for is a transcoding.

For a solution that does not rely on functions deprecated since C++17, and isn't really ugly, and which also does not require hefty third-party libraries, you can use the very lightweight UTF8-CPP (four small headers!) and its function utf8::utf32to8.

It's going to look something like this:

const uint32_t codepoint{0xF00};
std::vector<unsigned char> result;

try
{
   utf8::utf32to8(&codepoint, &codepoint + 1, std::back_inserter(result));
}
catch (const utf8::invalid_code_point&)
{
   // something
}

(There's also a utf8::unchecked::utf32to8, if you're allergic to exceptions.)

(And consider reading into vector<char8_t> or std::u8string, since C++20).

(Finally, note that I've specifically used uint32_t to ensure the input has the proper width.)

I tend to use this library in projects until I need something a little heavier for other purposes (at which point I'll typically switch to ICU).