Reading and Writing Text files with UTF-16LE encoding and Apache Commons IO

I have written an application in Java and duplicated it in C#. The application reads and writes text files with tab delimited data to be used by an HMI software. The HMI software requires UTF or ANSI encoding for the degree symbol to be displayed correctly or I would just use ASCII which seems to work fine. The C# application can open files saved by either with no problem. The java application reads files it saved perfectly but there is a small problem that crops up when reading the files saved with C#. It throws a numberformatexception when parsing the first character in the file to and int. This character is always a "1". I have opened both files up with editpadlight and they appear to be identical even when viewed with encoding and the encoding is UTF-16LE. I'm racking my brain on this, any help would be appreciated.

lines = FileUtils.readLines(file, "UTF-16LE");

Integer.parseInt(line[0])

I cannot see any difference between the file saved in C# and the one saved in Java

Screen Shot of Data in EditPad Lite

        if(lines.get(0).split("\\t")[0].length() == 2){
        lines.set(0, lines.get(0).substring(1));
    }

Solution

Your .NET code is probably writing a BOM. Compliant readers of Unicode, strip off any BOM since it is meta-data, not part of the text data.

Your Java code explicitly specifies the byte order

FileUtils.readLines(file, "UTF-16LE");

It's somewhat of a Catch-22; If the source has a BOM then you can read it as "UTF-16". If it doesn't then you can read it as "UTF-16LE" or "UTF-16BE" as you know which it is.

So, either write it with a BOM and read it without specifying the byte order, or, write it without a BOM and read it specifying the byte order.

With a BOM:

[C#]

File.WriteAllLines(file, lines, Encoding.Unicode);

[Java]

FileUtils.readLines(file, "UTF-16");

Without a BOM:

[C#]

File.WriteAllLines(file, lines, new UnicodeEncoding(false));

[Java]

FileUtils.readLines(file, "UTF-16LE");