javaunicodeencodingascii

Java Unicode to readable text conversion decoding


I am developing a Java application where I am consuming a web service. The web service is created using a SAP server, which encodes the data automatically in Unicode. I get a Unicode string from the web service.

" 倥䙄ㄭ㌮਍쿣ී㈊〠漠橢਍圯湩湁楳湅潣楤杮਍湥潤橢਍″‰扯൪㰊഼┊敄瑶灹⁥佐呓′†䘠湯⁴佃剕䕉⁒渠牯慭慌杮䔠ൎ⼊祔数⼠潆瑮਍匯扵祴数⼠祔数റ⼊慂敳潆瑮⼠潃牵敩൲⼊慎敭⼠う㄰਍䔯据摯湩⁧′‰൒㸊ാ攊摮扯൪㐊〠漠橢਍㰼਍䰯湥瑧⁨‵‰൒㸊ാ猊牴慥൭ 䘯〰‱⸱2 "

above is the response.

I want to convert it to readable text format like String. I am using core Java.


Solution

  • If you have byte[] or an InputStream (both binary data) you can get a String or a Reader (both text) with:

    final String encoding = "UTF-8"; // "UTF16LE" or "UTF-16BE"
    
    byte[] b = ...;
    String s = new String(b, encoding);
    
    InputStream is = ...;
    BufferedReader reader = new BufferedReader(new InputStreamReader(is, encoding));
    for (;;) {
        String line = reader.readLine();
    }
    

    The reverse process uses:

    byte[] b = s.geBytes(encoding);
    OutputStream os = ...;
    
    BufferedWriter writer = new BufferedWriter(new OuputStreamWriter(os, encoding));
    writer.println(s);
    

    Unicode is a numbering system for all characters. The UTF variants implement Unicode as bytes.


    Your problem:

    In normal ways (web service), you would already have received a String. You could write that string to a file using the Writer above for instance. Either to check it yourself with a full Unicode font, or to pass the file on for a check.

    You need (?) to check, which UTF variant the text is in. For Asiatic scripts UTF-16 (little endian or big endian) are optimal. In XML it would be defined already.


    Addition:

    FileWriter writes to a file using the default encoding (from operating system on your machine). Instead use:

    new OutputStreamWriter(new FileOutputStream(new File("...")), "UTF-8")
    

    If it is a binary PDF, as @bobince said, use just a FileOutputStream on byte[] or InputStream.