pythongisopenstreetmaposmiumpyosmium

Enable Handler Class Methods to Yield Instance Attributes


This question is regarding the package pyosmium specifically. I was just wondering if the following functionality is possible, and if not how it could be implemented.

I want to stream/yield certain instance attributes instead of updating them in-memory.

Currently we can do the following:

class Handler(osmium.SimpleHandler):
    def __init__(self):
        osmium.SimpleHandler.__init__(self)
        self.edge_and_nodes = [] 
    def way(self, w): 
        self.edge_and_nodes.append({'edge_id': w.id, 
                                        'nodes': [w.nodes[i].ref for i in range(len(w.nodes))]})
h = Handler()
h.apply_file("test.osm.pbf")
print("Edges and their connected nodes: {}".format(h.edge_and_nodes))

However, when dealing with large regions this is not scalable.

I would like a way of yielding a dictionary object that includes WayIds and related NodeIds (as well as tags, etc) for every WayObject. Is this possible?

I am looking for something like this:

class StreamHandler(osmium.SimpleHandler):
    def __init__(self):
        osmium.SimpleHandler.__init__(self)
        self.edge_and_nodes = [] 
    def way(self, w): 
        yield {'edge_id': w.id, 
               'nodes': [w.nodes[i].ref for i in range(len(w.nodes))]}
h = StreamHandler()
h.apply_file("test.osm.pbf")
for row in h.way(w): 
    print(row) 

But I am not sure how to pass the w parameter (WayObject) since that seems to be dealt with internally using the apply_file() method (and I can't seem to find the source code for that method).

Thanks!

Edit: the source code can be found here


Solution

  • I found a work-around. Using pydriosm I was able to add some custom generators that parse and stream *.osm.pbf files completely in Python. This is ideal for a Spark or Dataflow job that streams the data into a database.