I have a C program that currently reads in Chinese text and stores them as type wchar_t
. What I want to do is look for a specific character in the text, but I am not sure how to refer to the character in the code.
I essentially want to say:
wchar_t character;
if (character == 个) {
return 1;
}
else return 0;
Some logic has been omitted, obviously. How would I go about performing such logic on Chinese in C?
Edit: Got it to work. This code compiles with -std=c99, and prints out the character "个".
1 #include <locale.h>
2 #include <stdio.h>
3 #include <wchar.h>
4
5
6 int main() {
7 wchar_t test[] = L"\u4E2A";
8 setlocale(LC_ALL, "");
9 printf("%ls", test);
10 }
Depending on your compiler, if it allows source in a supported Unicode encoding, you can just compare against the actual symbol, otherwise, you can use a wide character constant:
#include <stdio.h>
int main()
{
int i;
wchar_t chinese[] = L"我不是中国人。";
for(i = 0; chinese[i]; ++i)
{
if(chinese[i] == L'不')
printf("found\n");
if(chinese[i] == L'\u4E0D')
printf("also found\n");
}
}
Note a wide character string is L"xxx"
while a wide character is L'x'
. A Unicode BMP code point can be specified with \uXXXX
.
FYI, I compiled with Visual Stdio 2012 with source encodings of UTF-8 with BOM, UTF-16 (little endian) and UTF-16 (big endian). UTF-8 without BOM did not work.