pythonxmlxml-parsingminidom

How to load xml file with specifc paragraph by xml in Python?


I have a xml file and its structure like that,

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<book>
    <toc>        <tocdiv pagenum="564">
            <title>9thmemo</title>
            <tocdiv pagenum="588">
                <title>b</title>
            </tocdiv>
        </tocdiv></toc>
    <chapter><title>9thmemo</title>
        <para>...</para>
        <para>...</para></chapter>
    <chapter>...</chapter>
    <chapter>...</chapter>
</book>

There are several chapters in the <book>...</book>, and each chapter has a title, I only want to read all content of this chapter,"9thmemo"(not others) I tried to read by following code:

from xml.dom import minidom

filename = "result.xml"
file = minidom.parse(filename)
chapters = file.getElementsByTagName('chapter')
for i in range(10):
    print(chapters[i])

I only get the address of each chapter... if I add some sub-element like chapters[i].title, it shows cannot find this attribute


Solution

  • I only want to read all content of this chapter,"9thmemo"(not others)

    The problem with the code is that it does not try to locate the specific 'chapter' while the answer code uses xpath in order to locate it.

    Try the below

    import xml.etree.ElementTree as ET
    
    
    xml = '''<?xml version="1.0" encoding="UTF-8"?>
    <book>
       <toc>
          <tocdiv pagenum="564">
             <title>9thmemo</title>
             <tocdiv pagenum="588">
                <title>b</title>
             </tocdiv>
          </tocdiv>
       </toc>
       <chapter>
          <title>9thmemo</title>
          <para>A</para>
          <para>B</para>
       </chapter>
       <chapter>...</chapter>
       <chapter>...</chapter>
    </book>'''
    
    root = ET.fromstring(xml)
    chapter = root.find('.//chapter/[title="9thmemo"]')
    para_data = ','.join(p.text for p in chapter.findall('para'))
    print(para_data)
    

    output

    A,B