cutf-8c99c89ansi-c

What must I know to handle UTF-8 in my C program?


I have a C program that now I need to do support to UTF-8 characters. What must I know in order to perform that? I've always hear how problematic is handle it in a C/C++ environment. Why exactly is it problematic? How does it differ from an usual C character, also its size? Can I do it without any operating system help, in pure C and still make it portable? what else I should have asked but I didn't? what I'm looking for implement is it: The characters are a name with accents(like french word: résumé) that I need to read it and put into a symbol table and then search and print them from a file. It's part of my configuration file parsing(very much .ini-like)


Solution

  • There's an awesome article written by Joel Spolsky, one of the Stack Overflow creators.

    The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    Apart from that, you might want to query some other Q&A's regarding this subject, like Handling special characters in C (UTF-8 encoding).

    As cited in the aforementioned Q&A, Tips on Using Unicode with C/C++ might give you the basics.