javaencodingxom

Why is the output different when serializing to different streams? (Java)


I have a problem with an xml that contains special characters (the problematic string is löööschee`*‘‘§a). The xml comes as an XOM Object in Java. While investigating the problem I tried to print out the text of the xml with a serializer. I noticed that streaming directly to System.out was the only way to get the correct string.

Here is the code I used for printing out the xml:

Element pEntry; //this is the XOM object I get, it contains the xml
Document document = pEntry.getDocument();
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Serializer serializer = new Serializer(stream);
Serializer serializer2 = new Serializer(System.out);
try {
    serializer.write(document);
    serializer2.write(document);
} catch (IOException e) {
    System.out.println(e.getMessage());
}
System.out.println("#####################################################################");
System.out.println(stream);

So serializer2 writes directly to System.out, there the string is as it should be. The System.out.println prints the string as l??????schee`*????????a. I tried many different things with different encodings (the standard encoding for the serializer is "UTF-8" which seems correct), but the only way I found, that prints out the correct string is directly streaming to System.out.
I also printed the bytes of the first stream, that does not work and this was the output:
6c ffffffc3 ffffffb6 ffffffc3 ffffffb6 ffffffc3 ffffffb6 73 63 68 65 65 60 2a ffffffe2 ffffff80 ffffff98 ffffffe2 ffffff80 ffffff98 ffffffc2 ffffffa7 61.
I don't really know if this is correct and I can't print out the bytes that are streaming directly to System.out. I saw that c3 b6 for example should be an ö, which would be correct, but I don't know about the ffffffs.
Why are they different, even if they use the same encoding?

Other things I tried:


Solution

  • Putting the line System.setOut(new PrintStream(System.out, true, StandardCharsets.UTF_8)); above the console output solved the problem, now the console is always showing the correct string.