maven-3woodstox

Woodstox parser works fine in test run in Eclipse, but fails from command line


One of my JUnit tests uses (behind the scenes) the Woodstox parser.

When I run the test from within Eclipse, the test succeeds as expected.

But running the same test on the command line, using

mvn clean test -Dtest=com.example.MyClassTest#someParserTest

results in the test to fail with the following exception messages:

Error on line 114 column 21
  SXXP0003: Error reported by XML parser: Invalid UTF-8 middle byte 0x3f (at char #4174, byte #3999)
    ...
    at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:314)
    at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:205)
    at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
    at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:55)
    at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:961)
    at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4580)
    at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3657)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1063)
    at com.ctc.wstx.sax.WstxSAXParser.fireEvents(WstxSAXParser.java:524)
    at com.ctc.wstx.sax.WstxSAXParser.parse(WstxSAXParser.java:452)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)
    at net.sf.saxon.event.Sender.send(Sender.java:171)
    at net.sf.saxon.jaxp.IdentityTransformer.transform(IdentityTransformer.java:363)

I took a look at the to-be-parsed InputStream. The InputStreams are identical in both cases.

Also, there is no "line 114 column 21" in the InputStream. Line 114 ends on column 11.

How can I investigate what causes the different behavior?


Solution

  • It turned out that a library I used made wrong assumptions about the environment's default character encoding (also called platform's default charset).

    In the Eclipse environment, calling Charset.defaultCharset() returned UTF-8, while in the command line environment it returned CP1252.

    Many standard and third-party Java APIs behave differently depending on the platform's default charset, among them:

    To resolve my issue, I had to update that library to explicitly use the correct character set: