I'm trying to find the number of revenue miles/kilometers of a "Route" by: day, month, and year; by querying a GTFS database that has the structure described here:
https://developers.google.com/transit/gtfs/reference
And see here for a very clear sketch of the structure:
http://blog.openplans.org/2012/08/the-openplans-guide-to-gtfs-data/
"Revenue distance traveled" definition:
("Available for passengers to use" distance)
The number of miles/kilometers traveled from the first actual bus stop where a passenger can board, to the last drop-off at the last bus stop, for that particular route and bus run. (then aggregated together for all service runs taken by all buses for that particular route)
-
"Revenue hours" definition:
("Available for passengers to use" time span)
The number of hours from the moment the vehicle arrives at the first bus stop, until the moment it drops off its last passenger at the last bus stop. (then aggregated together for all service runs taken by all buses for that particular route)
I'm using SQL Server/MSSQL. Though SQL Lite, or MySQL, or any SQL examples would be perfectly fine.
Basically, I need to be able to SELECT a route, and then correlate the data in the routes
, calendar_dates
, calendar
, stop-times
, stops
, and trips
tables to find how many miles/kilometers were covered from the first stop (stop_times
and stops
tables) to the last, how many hours elapsed, and find this for a particular service_id
(in trips
and calendar
tables), and then also for all service_id
s for a particular route, and be able to get all this for a particular date
(in calendar_dates
table), or spans of dates (day, month, 3-month period, year, etc).
If a few different queries are needed, that's fine. The revenue distance traveled per route, and the revenue hours per route, can be separate queries.
Has anyone who has done this before be willing to share their query structure for this or has anyone figured this out? Are there any examples of how to write this query? I've been looking everywhere online for weeks.
Here is a diagram image of the database I have created with all relationships shown in detail:
I have done exactly this for scheduled kms, by:
select t.route_id as id, r.route_short_name as route, sum(l.shape_dist/1000) as sched_kms
from gtfs_shape_lengths l
inner join gtfs_trips t on t.shape_id = l.shape_id
inner join gtfs_routes r on r.route_id = t.route_id
inner join gtfs_calendar c on t.service_id = c.service_id
where c.service_id ilike '%sat%'
group by t.route_id, r.route_short_name
union all
select 'total' as id, 'total_' as name,
sum(l.shape_dist/1000) as sched_kms
from gtfs_shape_lengths l
inner join gtfs_trips t on t.shape_id = l.shape_id
inner join gtfs_calendar c on t.service_id = c.service_id
where c.service_id ilike '%sat%'
order by sched_kms desc
Original writeup here: http://transitdata.net/using-gtfs-and-postgis-to-calculate-levels-of-scheduled-service/