pythonxmlfor-loopenumeratefindall

Enumerate with empty findall result


I have an xml file from which I want to count a number of tags with the name 'neighbor'. To be more specific, I want to count only the neighbor-tags, that are direct children of any of the country-tags.

Here are the contents of my xml file:

<?xml version="1.0"?>
<data>
    <country name="Austria">
        <rank>1</rank>
        <year>2008</year>
        <neighbor name="Liechtenstein"/>
        <neighbor name="Switzerland"/>
        <neighbor name="Italy"/>
    </country>
    <country name="Iceland">
        <hasnoneighbors/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <neighbor name="Malaysia"/>
        <someothertag>
             <neighbor name="Germany"/>
        </someothertag>
    </country>
    <neighbor name="Jupiter"/>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <neighbor name="Costa Rica"/>
        <neighbor name="Colombia"/>
        <country name="SubCountry">
            <rank>12</rank>
            <year>2023</year>
            <neighbor name="NeighborOfSubCountry"/>
        </country>
    </country>
</data>

The expected result should be 7. Germany and Jupiter should be left out of the total of 9 tags.

I've written the following piece of code:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

totalneighbors = 0
neighborlist = []

for country in root.iter('country'):
    print(f'Country {country.attrib["name"]} contains these neighbors:')
    for index, neighbor in enumerate(country.findall('neighbor')):
        neighborname = neighbor.attrib['name']
        print(f'neighbor no {index+1}, with name {neighbor.attrib["name"]}')
        neighborlist.append(neighbor.attrib['name'])
    print(f"total for this country is {index+1}\n")
    totalneighbors += index+1

print(f'total nr of neighbors in country-nodes is {totalneighbors} according to index-counting')
print(f"but the neighborlist says it's {len(neighborlist)}")

I wanted to count the tags with the enumerate-functionality from python, but it's giving me the wrong result (10 instead of 7). I put another way of counting in the code, by adding the 'findall' results to a list, and then using the length of that list. This does give me the correct number.

After adding some print statements in the code, I figured out where things go wrong; Iceland has no neighbors, but the print-statement shows that the index is still 3. It looks as if the index from the previous loop was never reset, and it just uses that 3 again, even though 'findall' should find nothing.

So my question is: What am I doing wrong? Why does 'enumerate' not give me 0 when 'findall' finds nothing? Am I using it wrong? Or is it just not possible when combined with an empty search result?

I hope someone can clarify what's going wrong here.


Solution

  • The problem lies in Iceland not having a neighbor, as you said. The first country has three neighbors, so the index will have the value of 2 after running the first for loop. But the loop won't execute for Iceland, because findall returns an empty list. so the index value would still have the value of the previous country.

    You can set the index to -1 before the for loop. That way your code works fine. Because nothing will be added to the totalneighbors if the country has no neighbor.

    # ...
    print(f'Country {country.attrib["name"]} contains these neighbors:')
    index = -1
    for index, neighbor in enumerate(country.findall('neighbor')):
    # remiander of the code
    

    But overall, I recommend using the lxml package and XPath. here you can find the docs: https://lxml.de/parsing.html

    for your purpose using XPath is the best option. You can find more information here: https://www.w3schools.com/xml/xpath_intro.asp

    the code using lxml would look like something like this:

    from lxml import etree
    
    root = etree.parse("/path/to/file.xml")
    neighbors = root.findall(".//country/neighbor") # this xpath finds all the neighbors exactly after country
    

    hope this helps.