c++unicodeasciimultibytemultibyte-functions

How to find whether byte read is japanese or english?


I have an array which contains Japanese and ascii characters. I am trying to find whether characters read is English character or Japanese characters.

in order to solve this i followed as

  1. read first byte , if multicharcterswidth is not equal to one, move pointer to next byte now display whole two byte together and display that Japanese character has been read.
  2. if multicharcterswidth is equal to one, display the byte. and show message english has been read.

above algo work fine but fails in case of halfwidth form of Japanese eg.シ,ァ etc. as it is only one byte. How can i find out whether characters are Japanese or English?

**Note:**What i tried I read from web that first byte will tell whether it is japanese or not which i have covered in step 1 of my algo. But It won't work for half width.

EDIT: The problem i was solving i include control characters 0X80 at start and end of my characters to identify the string of characters. i wrote following to identify the end of control character.

cntlchar.....(my characters , can be japnese).....cntlchar

if ((buf[*p+1] & 0X80) && (mbMBCS_charWidth(&buf[*p]) == 1))
  // end of control characters reached
else
  // *p++

it worked fine when for english but didn't work for japanese half width.

How can i handle this?


Solution

  • Your data must be using Windows Codepage 932. That is a guess, but examining the codepoints shows what you are describing.

    The codepage shows that characters in the range 00 to 7F are "English" (a better description is "7-bit ASCII"), the characters in the ranges 81 to 9F and E0 to FF are the first byte of a multibyte code, and everything between A1 and DF are half-width Kana characters.