I am using QXmlSimpleReader
to parse an XML file with internally defined entities in it, e.g.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ELEMENT root (#PCDATA)>
<!ENTITY ent "some internally defined entity">
]>
<root>
text &ent; text
</root>
I am handling the document with a QXmlDefaultHandler
subclass and the most I can do about internal entities is to have their usage reported.
By default all internally defined entities (&ent;
in the example above) are substituted into characters automatically. How can I change this behavior, so that I can specify to what string should they be replaced? I am also fine with switching to another Qt's XML reader if that is required to make it work.
I found one way to do it, although it is more of a hack then a proper solution, since it doesn't stop Qt from actually replacing the entity characters with resolved ones. It's just a workaround where those characters are ignored.
First, make the QXmlSimpleReader
report entities by setting the appropriate feature and handle content and lexical info:
QXmlSimpleReader xmlReader;
xmlReader.setFeature("http://qt-project.org/xml/features/report-start-end-entity", true);
xmlReader.setContentHandler(handler);
xmlReader.setLexicalHandler(handler);
Next, in the handler
above, override bool QXmlLexicalHandler::startEntity(const QString &name)
and bool QXmlLexicalHandler::endEntity(const QString &name)
and keep inside a state whether the reader is currently reading an entity. When it is, just ignore input from bool QXmlContentHandler::characters(const QString &ch)
and instead just handle the resolution in startEntity
or endEntity
.