javaxmljdom-2

Convert XML-File to string without manipulation or optimization in Java


I have some trouble with the JDOM2 whitch i use to work with XML files. I want to convert the XML file to a string without any manipulation or optimization.

Thats my Java code to do that:

SAXBuilder builder = new SAXBuilder();
    File xmlFile = f;

    try 
    {
        Document document = (Document) builder.build(xmlFile);

        xml = new XMLOutputter().outputString(document);

    } catch (Exception e) {
        System.out.println(e.getMessage());
    }

    return xml;

But when I compare my string with the original XML file I notice some changes.

The original:

<?xml version="1.0" encoding="windows-1252"?>
<xmi:XMI xmi:version="2.1" xmlns:uml="http://schema.omg.org/spec/UML/2.0" xmlns:xmi="http://schema.omg.org/spec/XMI/2.1" xmlns:thecustomprofile="http://www.sparxsystems.com/profiles/thecustomprofile/1.0" xmlns:SoaML="http://www.sparxsystems.com/profiles/SoaML/1.0">

And the string:

<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmlns:xmi="http://schema.omg.org/spec/XMI/2.1" xmlns:SoaML="http://www.sparxsystems.com/profiles/SoaML/1.0" xmlns:thecustomprofile="http://www.sparxsystems.com/profiles/thecustomprofile/1.0" xmlns:uml="http://schema.omg.org/spec/UML/2.0" xmi:version="2.1">

And all umlauts (ä, ö , ü) are changed too. I will get something like that: '�' instead of 'ä'.

Is there any way to stop that behaviore?


Solution

  • Firstly, as others have stated, you shouldn't use any XML processing. Just read the file as a text file.

    Secondly, your umlaut characters showing up as '�' is due to an incorrect charset (encoding) being used. The charset error may be in your code, or it may be the XML file.

    The original XML file contains encoding="windows-1252", but it's unusual for XML to be encoded in anything other than UTF-8, so I suspect the file is really a UTF-8 file and the encoding it claims to use is not correct.

    Try forcing UTF-8 when reading the file. It's good practice, regardless, to specify the charset when converting bytes to text:

    String xml = new String(
        Files.readAllBytes(xmlFile.toPath(), StandardCharsets.UTF_8));