python-3.xpandasdataframeprotocol-buffersgtfs

Converting protobuf feed to pandas dataframe


I'm trying to covert a protobuf feed to pandas dataframe for one of my hobby projects. I tried several different techniques to accomplish this but nothing seems to really solve my issue.

I use following code to retrieve GTFS-RT TripUpdates feed:

feed = gtfs_realtime_pb2.FeedMessage()
headers = {
    'Accept': 'application/octet-stream',
    'Accept-encoding': 'br, gzip, deflate'
}

response = requests.get('<PROVIDER:APIKEY>', headers=headers, stream=True)

feed.ParseFromString(response.content)
test_dict = protobuf_to_dict(feed)

The result of using protobuf_to_dict is a a dict with one single line:

{'header': {'gtfs_realtime_version': '2.0', 'incrementality': 0, 'timestamp': 1641582104}, 'entity': [{'id': '14050001276385923' [...]

I've tried several things get around this issue.

Reading feed message as JSON: did not work because the JSON object must be str, bytes or bytearray, not dict.

Iterating through dict:

for entity in test_dict.entity:
    if entity.HasField('vehicle')
        [logic for building dataframe]

It didn't work either, because 'dict' object has no attribute 'entity'.

Ok! After several hours of reading I tried to flatten and normalize feed message as described here and some other threads. Unfortunately, neither json_normalize or flatten_json did solve the issue.

At this point I feel like going in circle and not seeing something very obvious that might help me. The end-goal is to create a dataframe which contains TripUpdates data which later will be merged with another dataframe to update arrival and departure times.


Solution

  • The issue can be solved by iterating through feed message using simple foor loops:

    feed = gtfs_realtime_pb2.FeedMessage()
    headers = {
        'Accept': 'application/octet-stream',
        'Accept-encoding': 'br, gzip, deflate'
    }
    
    response = requests.get('<PROVIDER:APIKEY>', headers=headers, stream=True)
    
    feed.ParseFromString(response.content)
    
    for entity in feed.entity:
        if entity.HasField('trip_update'):
            # Accessing values in feed message
            if entity.trip_update.trip.trip_id == something:
                [add to list]
    

    Later, list will be converted to pandas dataframe.