I'm using woodstox to parse some svg-files. This works only if I'm online. Offline it seems like it wont use woodstox, but falls back to a default parser. In that case it is way slower (5min vs 15sec). With the current SVGs it will also throw exceptions.
Am I doing something wrong? Why is woodstox not beeing used offline?
Used Maven dependency:
<dependency>
<groupId>com.fasterxml.woodstox</groupId>
<artifactId>woodstox-core</artifactId>
<version>5.0.3</version>
</dependency>
Code for parsing:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLEventReader reader = inputFactory.createXMLEventReader(new FileInputStream(svgFile));
while(reader.hasNext()) {
XMLEvent event = reader.nextEvent();
...
}
This is the exception thrown by reader.nextEvent()
:
com.ctc.wstx.exc.WstxIOException: www.w3.org
at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:705)
at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:466)
at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:326)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3836)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2168)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1181)
at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:255)
This is one of my SVGs. Is it malformed?
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 16.0.0, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="64px"
height="64px" viewBox="0 0 64 64" enable-background="new 0 0 64 64" xml:space="preserve">
<g id="Ebene_1">
<path fill="currentColor" d="M38.338,9.412H12.592v47.438h38.521V22.296L38.338,9.412z M46.728,51.866H17.191V14.129h14.771v12.577
h14.766V51.866z"/>
</g>
</svg>
Parser is simply trying to load the DTD subset using URL specified in DOCTYPE declaration: "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd". This is what xml specification dictates it has to do (or use some mechanism for obtaining copy via public id). This must occur regardless of whether DTD validation is enabled: DTD subsets may also contain ENTITY declarations, and without reading it there is no way to know if so.
However: if there are no entities and you do not want DTD validation, you can simply disable DTD handling altogether with:
inputFactory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
which will prevent reading. You will also quickly find out if there were any entities you were missing. :)