javaxmlstaxaxiomwoodstox

DTD parsing with axiom


I'm trying to use axiom 1.2.22 with woodstox 6.2.6 to parse an XML document with a doctype. (I'm using OpenJDK 11 but that shouldn't make any difference.) I'm getting the same error that was mentioned in How to ignore DTD parsing in Apache's AXIOM :

Cannot create OMDocType because the XMLStreamReader doesn't support the DTDReader extension

According to https://issues.apache.org/jira/browse/AXIOM-475 that was supposed to be fixed with axiom 1.2.16, but it seems the bug is back again.

Example snippet:

    InputStream is = Test.class.getResourceAsStream("xml-with-dtd.xml");
    OMXMLParserWrapper builder = OMXMLBuilderFactory.createStAXOMBuilder(XMLInputFactory.newFactory().createXMLStreamReader(is));
    OMElement result = builder.getDocumentElement();

Am I using incompatible versions? I also tried using woodstox 5.0.0, which throws the same error. I also verified that it's actually the woodstox XMLInputFactory when using XMLInputFactory.newFactory() that is used. These are the maven dependencies that I use (I've omitted some exclusions related to logging and duplicated classes):

  <dependency>
    <groupId>com.fasterxml.woodstox</groupId>
    <artifactId>woodstox-core</artifactId>
    <version>6.2.6</version>
  </dependency>
  <dependency>
    <groupId>org.codehaus.woodstox</groupId>
    <artifactId>stax2-api</artifactId>
    <version>4.2.1</version>
  </dependency>
  <dependency>
    <groupId>org.apache.ws.commons.axiom</groupId>
    <artifactId>axiom-impl</artifactId>
    <version>1.2.22</version>
  </dependency>
  <dependency>
    <groupId>org.apache.ws.commons.axiom</groupId>
    <artifactId>axiom-api</artifactId>
    <version>1.2.22</version>
  </dependency>

Update: Looks a lot like the axiom code tries to determine a DTDReader class to use from a configuration property. Unfotunately setting the property DTDReader.PROPERTY in the XMLInputFactory to any value results in the following stack trace:

Exception in thread "main" java.lang.IllegalArgumentException: Unrecognized property 'org.apache.axiom.ext.stax.DTDReader'
    at com.ctc.wstx.api.CommonConfig.reportUnknownProperty(CommonConfig.java:167)
    at com.ctc.wstx.api.CommonConfig.setProperty(CommonConfig.java:158)
    at com.ctc.wstx.api.ReaderConfig.setProperty(ReaderConfig.java:35)
    at com.ctc.wstx.stax.WstxInputFactory.setProperty(WstxInputFactory.java:400)

Solution

  • I'm not sure why it didn't work when I tried it with woodstox 5, but this little patch against axiom 1.2.22 solves the problem at least for woodstox 6.2.6:

    Index: axiom-api/src/main/java/org/apache/axiom/util/stax/dialect/StAXDialectDetector.java
    ===================================================================
    --- axiom-api/src/main/java/org/apache/axiom/util/stax/dialect/StAXDialectDetector.java (revision 1891409)
    +++ axiom-api/src/main/java/org/apache/axiom/util/stax/dialect/StAXDialectDetector.java (working copy)
    @@ -274,6 +274,7 @@
                         return new Woodstox4Dialect(version.getComponent(1) == 0 && version.getComponent(2) < 11
                                 || version.getComponent(1) == 1 && version.getComponent(2) < 3);
                     case 5:
    +                case 6:
                         return new Woodstox4Dialect(false);
                     default:
                         return null;
    

    Update:

    Version 1.3.0 of axiom also fixes the problem.