I am trying to parse a xml file and arrange it into a table separating the contents as isElement, isAttribute, Value, Text.
How do I use ElementTree module to achieve this? I know this is possible using the minidom module.
The reason I want to use ElementTree is due to is effencicy. An exmaple of what I am trying to achive is available here: http://python.zirael.org/e-gtk-treeview4.html
Any advice on how to seprate the xml contents into element, subelemnt etc. using the ElementTree module?
This is what I have so far:
import xml.etree.cElementTree as ET
filetree = ET.ElementTree(file = "some_file.xml")
for child in filetree.iter():
print child.tag, child.text, child.attrib
For the following example xml file:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
I get this as output:
data
{}
country
{'name': 'Liechtenstein'}
rank 1 {}
year 2008 {}
gdppc 141100 {}
neighbor None {'direction': 'E', 'name': 'Austria'}
neighbor None {'direction': 'W', 'name': 'Switzerland'}
country
{'name': 'Singapore'}
rank 4 {}
year 2011 {}
gdppc 59900 {}
neighbor None {'direction': 'N', 'name': 'Malaysia'}
country
{'name': 'Panama'}
rank 68 {}
year 2011 {}
gdppc 13600 {}
neighbor None {'direction': 'W', 'name': 'Costa Rica'}
neighbor None {'direction': 'E', 'name': 'Colombia'}
I did find something simialr on another post but it uses the DOM module. Walk through all XML nodes in an element-nested structure
Based on the comment received, this is what I want to achieve:
data (type Element)
country(Element)
Text = None
name(Attribute)
value: Liechtenstein
rank(Element)
Text = 1
year(Element)
Text = 2008
gdppc(Element)
Text = 141100
neighbour(Element)
name(Attribute)
value: Austria
direction(Attribute)
value: E
neighbour(Element)
name(Attribute)
value: Switzerland
direction(Attribute)
value: W
country(Element)
Text = None
name(Attribute)
value: Singapore
rank(Element)
Text = 4
I want to be able to presente my data in a tree like structure as above. To do this I need to keeep track of their relationship. Hope this clarifies the question.
Element
objects are sequences containing their direct child elements. XML attributes are stored in a dictionary mapping attribute names to values. There are no text nodes as in DOM. Text ist stored as text
and tail
attributes. Text within the element but before the first subelement is stored in text
and text between that element and the next one is stored in tail
. So if we take the gtk-treeview4-2.py example from TreeView IV. - display of trees we have to rewrite this DOM code:
# ...
import xml.dom.minidom as dom
# ...
def create_interior(self):
# ...
doc = dom.parse(self.filename)
self.add_element_to_treestore(doc.childNodes[0], None)
# ...
def add_element_to_treestore(self, e, parent):
if isinstance(e, dom.Element):
me = self.model.append(parent, [e.nodeName, 'ELEMENT', ''])
for i in range(e.attributes.length):
a = e.attributes.item(i)
self.model.append(me, ['@' + a.name, 'ATTRIBUTE', a.value])
for ch in e.childNodes:
self.add_element_to_treestore(ch, me)
elif isinstance(e, dom.Text):
self.model.append(
parent, ['text()', 'TEXT_NODE', e.nodeValue.strip()])
by the following using ElementTree
:
# ...
from xml.etree import ElementTree as etree
# ...
def create_interior(self):
# ...
doc = etree.parse(self.filename)
self.add_element_to_treestore(doc.getroot())
# ...
def add_element_to_treestore(self, element, parent=None):
path = self.model.append(parent, [element.tag, 'ELEMENT', ''])
for name, value in sorted(element.attrib.iteritems()):
self.model.append(path, ['@' + name, 'ATTRIBUTE', value])
if element.text:
self.model.append(
path, ['text()', 'TEXT_NODE', element.text.strip()]
)
for child in element:
self.add_element_to_treestore(child, path)
if element.tail:
self.model.append(
path, ['text()', 'TEXT_NODE', element.tail.strip()]
)
Screenshot with your example data and the first subtree fully expanded:
Update: Screenshot of example data and relevant import lines in code added.