pythonpython-3.xxmlxml-parsingminidom

How can I read the attributes of a root node without getting them by name?


Suppose, I have the follwing XML file:

<?xml version="1.0" encoding="utf-8"?>
<library attrib1="att11" attrib2="att22">
    library-text
    <book isbn="1111111111">
        <title lang="en">T1 T1 T1 T1 T1</title>
        <date>2001</date>
        <author>A1 A1 A1 A1 A1</author>     
        <price>10.00</price>
    </book>
    <book isbn="2222222222">
        <title lang="en">T2 T2 T2 T2 T2</title>
        <date>2002</date>
        <author>A2 A2 A2 A2 A2</author>     
        <price>20.00</price>
    </book>
    <book isbn="3333333333">
        <title lang="en">T3 T3 T3 T3</title>
        <date>2003</date>
        <author>A3 A3 A3 A3 A3y</author>        
        <price>30.00</price>
    </book>
</library>

I want to programmatically print the names of the attributes and their values of the root-node.

How can I do that?

I tried the following code:

import xml.dom.minidom as minidom

xml_fname = "library.xml"

dom = minidom.parse(xml_fname) 

print(dom.firstChild.tagName)
print(dom.firstChild.attributes[0].value)

It gives the following error:

Traceback (most recent call last):
  File "main.py", line 8, in <module>
    print(dom.firstChild.attributes[0].value)
  File "/usr/lib/python3.8/xml/dom/minidom.py", line 552, in __getitem__
    return self._attrs[attname_or_tuple]
KeyError: 0

Solution

  • The DOM Node.attributes object is a NamedNodeMap object, you'll have to use the interface defined in the specification. You can't just index into them, there is no support for Python-like indexing.

    The specification tells you there is a .length attribute, and an item() method which returns a Node subtype, here those are Attr objects:

    >>> attributes = dom.firstChild.attributes
    >>> for i in range(attributes.length):
    ...     print(attributes.item(i))
    ...
    <xml.dom.minidom.Attr object at 0x10e47f6d0>
    <xml.dom.minidom.Attr object at 0x10e47f660>
    

    Each of those Attr objects have name and value attributes:

    >>> for i in range(attributes.length):
    ...     attr = attributes.item(i)
    ...     print(f"Name: {attr.name}, value: {attr.value}")
    ...
    Name: attrib1, value: att11
    Name: attrib2, value: att22
    

    I've said this in my previous answer, but I'll reiterate it here: The DOM API is extremely bare bones, and not at all Pythonic. It does not behave like you'd expect Python objects to behave. Use the ElementTree API if you want something more Pythonic. The ElementTree API elements have an .attrib attribute that's a Python dictionary, for example.