pythonxmlxml-parsingxml.etree

How to retrieve all values of a specific attribute from sub-elements that contain this attribute?


I have the following XML file:

<main>
  <node>
    <party iot="00">Big</party>
    <children type="me" value="3" iot="A">
       <p>
          <display iot="B|S">
             <figure iot="FF"/>
          </display>
       </p>
       <li iot="C"/>
       <ul/>
    </children>
  </node>
  <node>
    <party iot="01">Small</party>
    <children type="me" value="1" iot="N">
       <p>
          <display iot="T|F">
             <figure iot="MM"/>
          </display>
       </p>
    </children>
  </node>
</main>

How can I retrieve all values of iot attribute from sub-elements of children of the first node? I need to retrieve the values of iot as a list.

The expected result:

iot_list = ['A','B|S','FF','C']

This is my current code:

import xml.etree.ElementTree as ET

mytree = ET.parse("file.xml")
myroot = mytree.getroot()
list_nodes = myroot.findall('node')
for n in list_nodes:
   # ???

Solution

  • This is easier to do using the lxml library:

    If the sample xml in your question represents the exact structure of the actual xml:

    from lxml import etree
    data = """[your xml above]"""
    doc = etree.XML(data)
    
    print(doc.xpath('//node[1]//*[not(self::party)][@iot]/@iot'))
    

    More generically:

    for t in doc.xpath('//node[1]//children'):
        print(t.xpath('.//descendant-or-self::*/@iot'))
    

    In either case, the output should be

    ['A', 'B|S', 'FF', 'C']