c++unicodeioc++23char32-t

How do I print code point names like "NO-BREAK SPACE"?


I have some software that supports UTF-8 config files. It doesn't have extensive Unicode support, just parsing files. If there is a code point that is disallowed by the format, I would like to convert it for printing with a function like:

std::string code_point_to_string(char32_t c);

For example, if c is U'\N{ZERO-WIDTH JOINER}' (U+200D), I would like to return "ZERO-WIDTH JOINER" as a string.

Is there something in the C++ standard library or in {fmt} that I could use to accomplish this?

I know that returning "U+200D" would be pretty easy, but those U+ representations are somewhat user-hostile; you have look up what the characters mean.


Solution

  • Libicu (which is probably present on your system already) has a u_charName function for this purpose.