I was working with a .pri file. Which has a xml format. Like below.
<?xml version="1.0"?>
<!DOCTYPE text SYSTEM "text.dtd">
<text id="fn000001">
<au id="fn000001.1" s="N00023">
<w id="fn000001.1.1"> hi </w>
<w id="fn000001.1.2"> there </w>
<l id="fn000001.1.3"> ? </l>
</au>
</text>
Now if I call a single file, by using below command, it works properly.
import xml.etree.ElementTree as ET
tree = ET.parse('/path/fn000001.pri')
root = tree.getroot()
print(root.get('id'))
Now I want to apply this to all the .pri files in the folder. For that, I am using below command,
import glob
import xml.etree.ElementTree as ET
a = glob.glob('/path/*.pri')
for files in a:
tree = ET.parse(files)
print(tree)
That throws the error,
tree = ET.parse(files)
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
tree.parse(source, parser)
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 597, in parse
self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: undefined entity è: line 147, column 52
Please suggest possible solutions. The related .dtd
is in the same folder.
Ok! I came across many posts and answers related to this question. As per the reasons behind this error provided in one comment,
As I said, ElementTree does not support entities declared in a separate DTD file. Either declare entities in the XML file or use lxml. Or don't use entities at all.
So, main question is what to do so that, ElementTree supports the entities declared in a separate DTD file.
Solution 1 : Declare entities in the XML file or use lxml So I used lxml but did not declare the entities in the XML file. It will not solve your problem. And if you declare the entities in XML file, then it does not matter if you are using lxml package or not. So, declare the entities in the XML file as below.
<!DOCTYPE text [
<!ENTITY egrave "è">
]>
<text id="fn000001">
<au id="fn000001.1" s="N00023">
<w id="fn000001.1.1"> hi </w>
<w id="fn000001.1.2"> there </w>
<l id="fn000001.1.3"> ? </l>
</au>
</text>
This solution is provided here. ParseError: undefined entity while parsing XML file in Python
But what if you have 1000s of XML files and you want to parse them all at the same time? Then this solution will not work out.
dtd
file with your .XML file.
This is just one line of code. I used lxml package. Check the below code snippet. from lxml import etree
parser = etree.XMLParser(dtd_validation=True)
tree = etree.parse("file.xml", parser)
All you need to give dtd_validation=True
and the code will fetch the information from .dtd file and map it with your .XML file. Make sure .dtd file is in the same directory of your all XML files.