cunicodechinese-locale

How to refer to a Chinese character in C code


I have a C program that currently reads in Chinese text and stores them as type wchar_t. What I want to do is look for a specific character in the text, but I am not sure how to refer to the character in the code.

I essentially want to say:

wchar_t character;

if (character == 个) {
    return 1;
}

else return 0;

Some logic has been omitted, obviously. How would I go about performing such logic on Chinese in C?

Edit: Got it to work. This code compiles with -std=c99, and prints out the character "个".

1 #include <locale.h>
2 #include <stdio.h>
3 #include <wchar.h>
4 
5 
6 int main() {
7         wchar_t test[] = L"\u4E2A";
8         setlocale(LC_ALL, "");
9         printf("%ls", test);
10 }

Solution

  • Depending on your compiler, if it allows source in a supported Unicode encoding, you can just compare against the actual symbol, otherwise, you can use a wide character constant:

    #include <stdio.h>
    
    int main()
    {
        int i;
        wchar_t chinese[] = L"我不是中国人。";
        for(i = 0; chinese[i]; ++i)
        {
            if(chinese[i] == L'不')
                printf("found\n");
            if(chinese[i] == L'\u4E0D')
                printf("also found\n");
        }
    }
    

    Note a wide character string is L"xxx" while a wide character is L'x'. A Unicode BMP code point can be specified with \uXXXX.

    FYI, I compiled with Visual Stdio 2012 with source encodings of UTF-8 with BOM, UTF-16 (little endian) and UTF-16 (big endian). UTF-8 without BOM did not work.