I have a script that parses XML using lxml.etree
:
from lxml import etree
parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
tree = etree.parse('main.xml', parser=parser)
I need load_dtd=True
and resolve_entities=True
be have &emptyEntry;
from globals.xml
resolved:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE map SYSTEM "globals.xml" [
<!ENTITY dirData "${DATADIR}">
]>
<map
xmlns:map="http://my.dummy.org/map"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsschemaLocation="http://my.dummy.org/map main.xsd"
>
&emptyEntry; <!-- from globals.xml -->
<entry><key>KEY</key><value>VALUE</value></entry>
<entry><key>KEY</key><value>VALUE</value></entry>
</map>
with globals.xml
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY emptyEntry "<entry></entry>">
Now I would like to move from non-standard lxml
to standard xml.etree
. But this fails with my file because the load_dtd=True
and resolve_entities=True
is not supported by xml.etree
.
Is there an xml.etree
-way to have these entities resolved?
lxml is a right tool for the job.
But, if you want to use stdlib, then be prepared for difficulties and take a look at XMLParser's UseForeignDTD method. Here's a good (but hacky) example: Python ElementTree support for parsing unknown XML entities?