pythongtfstransit

splitting gtfs transit data into smaller ones


I sometime have a very large size of gtfs zip file - valid for a period of 6 months, but this is not economic for loading such big data size into a low resource (for example, 2 gig of memory and 10 gig hard disk) EC2 server.

I hope to be able split this large size gtfs into 3 smaller gtfs zip files with 2 months (6months/3files) period worth of valid data, of course that means I will need to replace data every 2 months.

I have found a python program that achieve the opposite goal MERGE here https://github.com/google/transitfeed/blob/master/merge.py (this is a very good python project btw.)

I am very thankful for any pointer.

Best regards,

Dunn.


Solution

  • It's worth noting that entries in stop_times.txt are usually the biggest memory hog when it comes to loading a GTFS feed. Since most systems do not replicate trips+stop_times for the dates when those trips are active, reducing the service calendar probably won't save you much.

    That said, there are some tools for slicing and dicing GTFS. Check out the OneBusAway GTFS Transformer tool, for example:

    http://developer.onebusaway.org/modules/onebusaway-gtfs-modules/1.3.3/onebusaway-gtfs-transformer-cli.html