pythonpython-3.xxmlminidom

Why does dom.firstChild.firstChild.nodeValue print the text inside the root tag?


library.xml

<?xml version="1.0" encoding="utf-8"?>
<library>library-text. :D
    <book isbn="1111111111">
        <title lang="en">T1 T1 T1 T1 T1</title>
        <date>2001</date>
        <author>A1 A1 A1 A1 A1</author>     
        <price>10.00</price>
    </book>
    <book isbn="2222222222">
        <title lang="en">T2 T2 T2 T2 T2</title>
        <date>2002</date>
        <author>A2 A2 A2 A2 A2</author>     
        <price>20.00</price>
    </book>
    <book isbn="3333333333">
        <title lang="en">T3 T3 T3 T3</title>
        <date>2003</date>
        <author>A3 A3 A3 A3 A3y</author>        
        <price>30.00</price>
    </book>
</library>

Python code

import xml.dom.minidom as minidom

xml_fname = "library.xml"

dom = minidom.parse(xml_fname) 

print(dom.firstChild.tagName)
print(dom.firstChild.firstChild.nodeValue)

Output

library
library-text. :D

Why does dom.firstChild.firstChild.nodeValue print the text inside the root tag?

Shouldn't it have been dom.firstChild.nodeValue?


Solution

  • Nodes in the DOM do not only represent elements, text values are also nodes. The first child node inside the <library> element is a text node, and it's value is the Python string 'library-text. :D\n ':

    >>> dom.firstChild.firstChild
    <DOM Text node "'library-te'...">
    >>> dom.firstChild.firstChild.nodeValue
    'library-text. :D\n    '
    

    Note that the nodeValue property of Elements is always null (== None in Python); see the DOM level 1 definition for Node:

    In cases where there is no obvious mapping of these attributes for a specific nodeType (e.g., nodeValue for an Element or attributes for a Comment), this returns null.

    What node type holds what type of value for Node.nodeValue is specified in the Definition Group NodeType section.

    The DOM API is a very bare bones, basic API, aimed at compatibility with a very broad range of languages, and this is especially true for the DOM Level 1 specification (the only spec that minidom supports). You generally don't want to use it, at all, if you can possibly avoid it. In Python, use a higher-level API like the ElementTree API (use the lxml library, which is a more feature-rich compatible implementation).

    Using ElementTree, you deal primarily with just elements, and text is accessible via the text and tail attribute on elements.