I have an array which contains Japanese and ascii characters. I am trying to find whether characters read is English character or Japanese characters.
in order to solve this i followed as
above algo work fine but fails in case of halfwidth form of Japanese eg.シ,ァ etc. as it is only one byte. How can i find out whether characters are Japanese or English?
**Note:**What i tried I read from web that first byte will tell whether it is japanese or not which i have covered in step 1 of my algo. But It won't work for half width.
EDIT: The problem i was solving i include control characters 0X80 at start and end of my characters to identify the string of characters. i wrote following to identify the end of control character.
cntlchar.....(my characters , can be japnese).....cntlchar
if ((buf[*p+1] & 0X80) && (mbMBCS_charWidth(&buf[*p]) == 1))
// end of control characters reached
else
// *p++
it worked fine when for english but didn't work for japanese half width.
How can i handle this?
Your data must be using Windows Codepage 932. That is a guess, but examining the codepoints shows what you are describing.
The codepage shows that characters in the range 00
to 7F
are "English" (a better description is "7-bit ASCII"), the characters in the ranges 81
to 9F
and E0
to FF
are the first byte of a multibyte code, and everything between A1
and DF
are half-width Kana characters.