javafile-iounicodecharacter-encodingfileoutputstream

Byte streams in java


Can we write Unicode Data in a File with ByteStreams? My code is:

 public static void main(String[] args) throws Exception {

    String str = "Русский язык ";
    FileOutputStream fos = new FileOutputStream("file path");
    fos.write(str.getBytes());
    fos.flush();
    fos.close();
}

Here i am using a byte stream to write unicode data, but it is writing properly.I am new to java but i have read that byte streams do not support unicode characters. So, why does it is working in this case?


Solution

  • i have read that byte streams do not support unicode characters.

    Either you have used a bad source of information or you have probably misunderstood something. Byte streams support bytes. Therefore byte streams support anything that can be represented in bytes. Videos, text, pictures, music... If byte stream doesn't support it, it cannot be used in a digital computer at all.

    The trick to represent those things in what is a simply a sequence of 1 and 0's, is to use agreed upon rules. You would encode your text according to certain rules, and then the receiver can decode it back using the same rules.

    "Русский язык" can be represented as bytes in any encoding that supports cyrillic characters. In any of the encodings of unicode: UTF-8, UTF-16, UTF-32; Windows-1251, KOI8-R, KOI8-U, ISO-8859-5...

    That doesn't mean these encodings are compatible with each other. They are all incompatible when it comes to encoding Cyrillic script, so text encoded in one the encodings, must strictly be decoded in that encoding.

    .getBytes() uses the platform default encoding, which happened to be a one that supported Cyrillic script. You might believe it's UTF-8 but if you are on Windows, it's far more likely to be Cp1251. Don't fall into trap that just because you used "unicode characters", that your files are physically encoded in an UTF encoding. That will lead to encoding problems.

    So always be explicit about encoding, so that your program works the same on any platform and so that you always know what encoding the files your program created are in. With your code, you could have done this:

    String str = "Русский язык ";
    FileOutputStream fos = new FileOutputStream("file path");
    fos.write(str.getBytes("UTF-8"));
    fos.flush();
    fos.close();
    

    Or as suggested by the other answer:

    String str = "Русский язык ";
    OutputStreamWriter osw = new OutputStreamWriter(
            new FileOutputStream("file path"), "UTF-8"
    );
    osw.write(str);
    osw.flush();
    osw.close();
    

    These are technically exactly the same; text is being converted to bytes according to UTF-8 rules.