I'm trying to covert a protobuf feed to pandas dataframe for one of my hobby projects. I tried several different techniques to accomplish this but nothing seems to really solve my issue.
I use following code to retrieve GTFS-RT TripUpdates feed:
feed = gtfs_realtime_pb2.FeedMessage()
headers = {
'Accept': 'application/octet-stream',
'Accept-encoding': 'br, gzip, deflate'
}
response = requests.get('<PROVIDER:APIKEY>', headers=headers, stream=True)
feed.ParseFromString(response.content)
test_dict = protobuf_to_dict(feed)
The result of using protobuf_to_dict
is a a dict with one single line:
{'header': {'gtfs_realtime_version': '2.0', 'incrementality': 0, 'timestamp': 1641582104}, 'entity': [{'id': '14050001276385923' [...]
I've tried several things get around this issue.
Reading feed message as JSON: did not work because the JSON object must be str, bytes or bytearray, not dict.
Iterating through dict:
for entity in test_dict.entity:
if entity.HasField('vehicle')
[logic for building dataframe]
It didn't work either, because 'dict' object has no attribute 'entity'.
Ok! After several hours of reading I tried to flatten and normalize feed message as described here and some other threads. Unfortunately, neither json_normalize
or flatten_json
did solve the issue.
At this point I feel like going in circle and not seeing something very obvious that might help me. The end-goal is to create a dataframe which contains TripUpdates data which later will be merged with another dataframe to update arrival and departure times.
The issue can be solved by iterating through feed message using simple foor loops:
feed = gtfs_realtime_pb2.FeedMessage()
headers = {
'Accept': 'application/octet-stream',
'Accept-encoding': 'br, gzip, deflate'
}
response = requests.get('<PROVIDER:APIKEY>', headers=headers, stream=True)
feed.ParseFromString(response.content)
for entity in feed.entity:
if entity.HasField('trip_update'):
# Accessing values in feed message
if entity.trip_update.trip.trip_id == something:
[add to list]
Later, list will be converted to pandas dataframe.