pythonpython-3.xxmlxml-parsingxml.etree

How to edit specific blocks of XML using Python?


I have a sample XML that looks something like this

      <WESTERN>
            <TYPE>ABC</TYPE>
            <TYPE>MNO</TYPE>
            <COUNTRY>
                <NAME>MONACO</NAME>
                <DETAILS>           
                     <EUROPE CAPITAL="Monaco" />
                     <EUROPE population= "123456" />
                </DETAILS> 
            </COUNTRY>
            <COUNTRY>
                <NAME>MALTA</NAME>
                <DETAILS>
                    <EUROPE CAPITAL="Valletta" />
                    <EUROPE population= "123456" />
                </DETAILS>
                <DETAILS>
                    <EUROPE CONTINENT="EUROPE" />
                    <EUROPE GDP= "66666666"  />
                </DETAILS>
                <DETAILS>
                    <EUROPE CLIMATE="Warm" />
                    <EUROPE Votes= "123" />
                </DETAILS>
            </COUNTRY>
            <COUNTRY>
                 <NAME>ANDORRA</NAME>
                 <DETAILS>
                   <EUROPE CAPITAL="Andorra la Vella" />
                   <EUROPE population= "123456" />
                 </DETAILS>
           </COUNTRY>
        </WESTERN>

I need to add new tags to this XML. New tags need to be added only to the Middle COUNTRY tag and (inside all the COUNTRY/DETAILS)...Ik the explanation is confusing ,but here's how the expected outcome might look like,it's clear hopefully...

      <WESTERN>
            <TYPE>ABC</TYPE>
            <TYPE>MNO</TYPE>
            <COUNTRY>
                <NAME>MONACO</NAME>
                <DETAILS>           
                     <EUROPE CAPITAL="Monaco" />
                     <EUROPE population= "123456" />
                </DETAILS> 
            </COUNTRY>
            <COUNTRY>
                <NAME>MALTA</NAME>
                <DETAILS>
                    <EUROPE CAPITAL="Valletta" />
                    <EUROPE population= "123456" />

                    <EUROPE tag = "NEW"/>
                </DETAILS>
                <DETAILS>
                    <EUROPE CONTINENT="EUROPE" />
                    <EUROPE GDP= "66666666" />

                    <EUROPE tag = "NEW"/>
                </DETAILS>
                <DETAILS>
                    <EUROPE CLIMATE="Warm" />
                    <EUROPE Votes= "123" />

                    <EUROPE tag = "NEW"/>
                </DETAILS>
            </COUNTRY>
            <COUNTRY>
                 <NAME>ANDORRA</NAME>
                 <DETAILS>
                   <EUROPE CAPITAL="Andorra la Vella" />
                   <EUROPE population= "123456" />
                 </DETAILS>
           </COUNTRY>
        </WESTERN>

I tried separating the tags something like this but not sure how to manipulate only inside that specific Country TAG,the new entry is getting populated inside all the COUNTRY/DETAILS tag instead of just the COUNTRY[1]

tree = ET.parse('abc.xml')
root = tree.getroot()
tasks = tree.findall(".//COUNTRY")[1:2]
new_entry = tree.findall(".//DETAILS")
for i in new_entry:
    i.append(ET.fromstring('<EUROPE tag = "NEW"/>'))

Solution

  • Here is one way to do it. Find the <DETAILS> elements that apply to MALTA only:

    details = tree.findall(".//COUNTRY[NAME='MALTA']/DETAILS")
    
    for d in details:
        d.append(ET.fromstring('<EUROPE tag = "NEW"/>'))
    

    You could also do it by simply locating the second (middle) <COUNTRY> element (note that XPath position indexes start at 1):

    details = tree.findall(".//COUNTRY[2]/DETAILS")
    
    for d in details:
        d.append(ET.fromstring('<EUROPE tag = "NEW"/>'))