pythonpython-3.xxmlxml-parsingxml.etree

How to access nested XML tags for comparision using Python?


I have this original XML which needs to be modified

            <COUNTRY>
                <NAME>Place ="MALTA"</NAME>
                <DETAILS ID = "tag1"/>
                    <EUROPE CAPITAL="Valletta" />
                    <EUROPE population=123456 />
                    <EUROPE tag = "new"/>
                </DETAILS>
                <DETAILS ID = "tag2"/>
                    <EUROPE CAPITAL="NEW_CAPITAL" />
                    <EUROPE GDP=66666666 />
                    <EUROPE tag = "new"/>
                </DETAILS>
                <DETAILS ID = "tag3"/>
                    <EUROPE CLIMATE="Warm" />
                    <EUROPE Votes=123 />
                    <EUROPE tag = "new"/>
                </DETAILS>
            </COUNTRY>

Now I need to modify this XML after comparing the tags,here I need to compare COUNTRY/DETAILS/ID tag for example: if ID == "tag1" add a new tag(<EUROPE tag = "tag1"/>). If ID == tag2 need to add(<EUROPE tag = "tag2"/>). Basically I'm trying to modify a particular block of XML using its "TEXT" as a reference instead of TAG or its ATTRIBUTE. TL;DR - Explanation might be a lil confusing, the tried approach code below might be beneficial.

           <COUNTRY>
                <NAME>Place ="MALTA"</NAME>
                <DETAILS ID = "tag1"/>
                    <EUROPE CAPITAL="Valletta" />
                    <EUROPE population=123456 />
                    <EUROPE tag = "new"/>
                    <EUROPE tag = "tag1"/>
                </DETAILS>
                <DETAILS ID = "tag2"/>
                    <EUROPE CAPITAL="NEW_CAPITAL" />
                    <EUROPE GDP=66666666 />
                    <EUROPE tag = "new"/>
                    <EUROPE tag = "tag2"/>
                </DETAILS>
                <DETAILS ID = "tag3"/>
                    <EUROPE CLIMATE="Warm" />
                    <EUROPE Votes=123 />
                    <EUROPE tag = "new"/>
                </DETAILS>
            </COUNTRY>

STEP1 - Compare the tag to ID(If ID == "tag1")

STEP2 - do something if successful(in this case add <EUROPE tag = "tag1"/>)

I tried the below approach but wasn't successful.When I try to iterate through "details" variable, it's empty. Not sure if it's able to populate specified XML entries.

tree = ET.parse('abc.xml')
root = tree.getroot()
details= tree.findall(".//COUNTRY[DETAILS='ID:\"tag1\"')
for d in details:
     d.append(ET.fromstring('<EUROPE tag = "tag1"/>'))
details2= tree.findall(".//COUNTRY[DETAILS='ID:\"tag2\"')
for d in details2:
     d.append(ET.fromstring('<EUROPE tag = "tag2"/>'))


Solution

  • As mentioned in comments to your question, both your sample xml and expected output are not well formed. But assuming your sample xml is fixed like so:

    <COUNTRY>
      <NAME>Place ="MALTA"
      </NAME>
      <DETAILS ID = "tag1">
        <EUROPE CAPITAL="Valletta" />
        <EUROPE population="123456" />
        <EUROPE tag = "new"/>
      </DETAILS>
      <DETAILS ID = "tag2">
        <EUROPE CAPITAL="NEW_CAPITAL" />
        <EUROPE GDP="66666666" />
        <EUROPE tag = "new"/>
      </DETAILS>
      <DETAILS ID = "tag3">
        <EUROPE CLIMATE="Warm" />
        <EUROPE Votes="123" />
      </DETAILS>
    </COUNTRY>
    

    and that I understand your question correctly, your main issue is with your xpath expression .//COUNTRY[DETAILS='ID:\"tag1\", which seems to confuse elements and attributes. This should work:

    for country in root.findall('.//DETAILS'):
        new_euo = ET.fromstring(f'<EUROPE tag = "{country.get("ID")}"/>')
        size = len(country.findall('.//*'))
        #size is necessary to determine the insertion place, since the number
        #of <EUROPE> children seems to change in each <DETAILS>
        country.insert(size,new_euo)
        ET.indent(root, space=' ', level=2) 
        #indent() works with python 3.9 and above; otherwise - just delete it                          
    print(ET.tostring(root).decode())
    

    Output:

    <COUNTRY>
       <NAME>Place ="MALTA"</NAME>
       <DETAILS ID="tag1">
        <EUROPE CAPITAL="Valletta" />
        <EUROPE population="123456" />
        <EUROPE tag="new" />
        <EUROPE tag="tag1" />
       </DETAILS>
       <DETAILS ID="tag2">
        <EUROPE CAPITAL="NEW_CAPITAL" />
        <EUROPE GDP="66666666" />
        <EUROPE tag="new" />
        <EUROPE tag="tag2" />
       </DETAILS>
       <DETAILS ID="tag3">
        <EUROPE CLIMATE="Warm" />
        <EUROPE Votes="123" />
        <EUROPE tag="tag3" />
       </DETAILS>
      </COUNTRY>