javacharacter-encodingcharacternon-ascii-charactersshift-jis

Different font for Shift-JIS encoded string


In Java, I am reading an array of bytes from a file encoded in Shift-JIS format, but the "style" of the characters in the acquired string looks different than normal strings (wider?). Here is an example of what I mean for the "P" letter:
P - P
As you can see the first one in Shift-JIS looks different than the second one. Is there a way to use "normal" characters even for Shift-JIS strings?
I am using this piece of code to perform the conversion:

String jis = new String(byteArray, Charset.forName("Shift_JIS"));

Solution

  • Strictly speaking, These are different characters. The first is the Fullwidth Latin Capital Letter P in Unicode, from Japanese JIS X 0208 charset (U+FF30). The second P is the Latin Capital Letter P from ASCII (U+0050).

    So, you have to convert fullwidth characters to halfwidth characters. You can do this with ICU4J's Transliterator.

    Transliterator transliterator = Transliterator.getInstance("Halfwidth-Fullwidth");
    String result = transliterator.transliterate("P - P");
    System.out.println(result); // You will get "P - P"