pythonxmliterparse

How to find the starting element name in xml using iterparse


I have the following sample xml

<osm version="0.6" generator="CGImap 0.3.3 (28791 thorn-03.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
 <bounds minlat="41.9704500" minlon="-87.6928300" maxlat="41.9758200" maxlon="-87.6894800"/>
 <node id="261114295" visible="true" version="7" changeset="11129782" timestamp="2012-03-28T18:31:23Z" user="bbmiller" uid="451048" lat="41.9730791" lon="-87.6866303"/>

and I want to extract bounds and node from the xml using python iter parse I have tried the following code snippet

import xml.etree.cElementTree as ET
import pprint

def count_tags(filename):
    mytags = {}
    osmfile = open('example.osm', 'r')
    for event, elem in ET.iterparse(osmfile,events=('end',)):
        if elem.tag == "tag":
            if elem.attrib['k'] in mytags:
                mytags[elem.attrib['k']] += 1
            else:
                mytags[elem.attrib['k']] = 1

but i m not able to extract the bounds and node ...what am i missing ?


Solution

  • Assuming bounds and node are one level under the root of the XML, this should work:

    def count_tags():
        mytags = {}
        for event, child in ET.iterparse('example.osm'):
            if child.tag in ('bounds', 'node'):
                mytags[child.tag] = child.attrib
        print mytags
    

    Calling count_tags outputs:

    {
        'node': {'changeset': '11129782', 'uid': '451048', 'timestamp': '2012-03-28T18:31:23Z', 'lon': '-87.6866303', 'visible': 'true', 'version': '7', 'user': 'bbmiller', 'lat': '41.9730791', 'id': '261114295'}, 
        'bounds': {'minlat': '41.9704500', 'maxlon': '-87.6894800', 'minlon': '-87.6928300', 'maxlat': '41.9758200'}
    }