I am searching for a way to get the amount of active trips in a GTFS feed for each minute of a day through a postgresql query.
For every trip I have the start and endtimes (in seconds) in a denormalized table. It looks something like this:
The query to give me trips that are active for a given timeframe (e.g. here 43000 to 43600 seconds) looks like this:
SELECT
COUNT(trips.trip_id)
FROM denormalized_trips AS trips
LEFT JOIN gtfs_calendar_dates AS calendar_dates
ON calendar_dates.service_id = trips.service_id
AND calendar_dates.agency_key = trips.agency_key
AND date = '2017-07-03'
AND exception_type = 1
INNER JOIN gtfs_calendar AS calendar
ON trips.service_id = calendar.service_id
AND calendar.agency_key = trips.agency_key
AND calendar.wednesday = 1
WHERE (
trip_start_time BETWEEN 46800 AND 47100
AND '2017-07-03' BETWEEN calendar.start_date AND calendar.end_date
)
AND NOT EXISTS (
SELECT 0
FROM gtfs_calendar_dates AS date_exceptions
WHERE date = '2017-07-03'
AND date_exceptions.agency_key = trips.agency_key
AND date_exceptions.service_id = calendar.service_id
AND exception_type = 2
);
This will result in 12 trips that will start between 13:00pm and 13:05pm.
Now I want to do that for the whole day. I want to get the amount of trips that get active in an interval of e.g. 1 minute or maybe more like 5 minutes. I tried it with a loop but this seems to just give me 1 result back. Here is what I came up with for now:
CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS INTEGER AS
$BODY$
DECLARE
count INTEGER;
BEGIN
FOR counter IN 43130..50000 BY 60 LOOP
SELECT
COUNT(trips.trip_id)
INTO count
FROM denormalized_trips AS trips
LEFT JOIN gtfs_calendar_dates AS calendar_dates
ON calendar_dates.service_id = trips.service_id
AND calendar_dates.agency_key = trips.agency_key
AND date = '2017-07-03'
AND exception_type = 1
INNER JOIN gtfs_calendar AS calendar
ON trips.service_id = calendar.service_id
AND calendar.agency_key = trips.agency_key
AND calendar.wednesday = 1
WHERE (
trip_start_time BETWEEN counter AND counter + 60
AND '2017-07-03' BETWEEN calendar.start_date AND calendar.end_date
)
AND NOT EXISTS (
SELECT 0
FROM gtfs_calendar_dates AS date_exceptions
WHERE date = '2017-07-03'
AND date_exceptions.agency_key = trips.agency_key
AND date_exceptions.service_id = calendar.service_id
AND exception_type = 2
);
END LOOP;
RETURN count;
END;
$BODY$ LANGUAGE plpgsql STABLE;
The result of calling SELECT get_active_trips(1);
Now I would like to get something like a table or an array of results back instead of just 1 entry. How would I do that?
Any help is highly appreciated.
There are two syntaxes to create function returning set of values: returns setof <type>
and returns table(<columns definition>)
. Documentation.
There are also several ways to return those values from the plpgsql
function: return next
or return query
. Documentation.
So if you want just get the series of integers you could to rewrite your function in that way:
CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS SETOF INTEGER AS
$BODY$
DECLARE
count INTEGER;
BEGIN
FOR counter IN 43130..50000 BY 60 LOOP
SELECT
COUNT(trips.trip_id)
INTO count
FROM denormalized_trips AS trips
<rest of query here>
;
RETURN NEXT count;
END LOOP;
RETURN;
END;
$BODY$ LANGUAGE plpgsql STABLE;
or, using RETURN QUERY
:
CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS SETOF INTEGER AS
$BODY$
BEGIN
FOR counter IN 43130..50000 BY 60 LOOP
RETURN QUERY
SELECT
COUNT(trips.trip_id)
FROM denormalized_trips AS trips
<rest of query here>
;
END LOOP;
RETURN;
END;
$BODY$ LANGUAGE plpgsql STABLE;
In case if you want to return more then one column:
CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS TABLE (counter_value int, active_trips_count int) AS
$BODY$
BEGIN
FOR counter IN 43130..50000 BY 60 LOOP
SELECT
COUNT(trips.trip_id)
INTO active_trips_count
FROM denormalized_trips AS trips
<rest of query here>
;
counter_value := counter;
RETURN NEXT; -- There is no parameters, current values of counter_value and active_trips_count will be returned
END LOOP;
RETURN;
END;
$BODY$ LANGUAGE plpgsql STABLE;
or, using RETURN QUERY
:
CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS TABLE (counter_value int, active_trips_count int) AS
$BODY$
BEGIN
FOR counter IN 43130..50000 BY 60 LOOP
RETURN QUERY
SELECT
counter,
COUNT(trips.trip_id)
FROM denormalized_trips AS trips
<rest of query here>
;
END LOOP;
RETURN;
END;
$BODY$ LANGUAGE plpgsql STABLE;
Finally there is alternative declaration for returns table
:
CREATE OR REPLACE FUNCTION get_active_trips(
n int,
out counter_value int,
out active_trips_count int)
RETURNS SETOF RECORD AS
Update
But(!) I feeling that it is possible to simplify your task using single query, without loop.
Consider the following query (i'v used your simplified query from sqlfiddle):
select
counter,
count(trips.trip_id)
from
generate_series(43130, 50000, 60) as counter left join
denormalized_trips as trips on (trip_start_time between counter and counter + 60)
group by counter
order by counter;