This is a tough one I've been fighting with for a bit. I have a table called action_events
that stores event_name
and timestamp
:
CREATE TABLE action_events (
id SERIAL PRIMARY KEY,
timestamp TIMESTAMP NOT NULL,
event_name VARCHAR(255) NOT NULL
);
I want to grab the most recent "streak" for an event with a given name.
A "streak" is the number of consecutive days that the event has occurred at least once. An event may occur more than once a day. Big gotcha: The streak should also take into account a given timezone.
Using this fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/7828
Given a query for "exercise" in "MST" timezone, I'd expect the streak to be:
Streak Count | Name | TimeZone | Start Date | End Date |
---|---|---|---|---|
13 | "exercise" | "MST" | 2023-02-18 09:00:00 | 2023-02-30 09:00:00 |
Even though the streak ended a month ago, it should still show as the most recent streak.
Plain timestamp
data is ignorant of time zones. A given time zone is meaningless while we don't know the time zone of the timestamp data.
I'll assume your timestamps are supposed to represent UTC. Should really be timestamptz
to avoid ambiguity and an additional transformation.
Plain SQL. The query won't get much snappier than this:
SELECT sum(ct) AS streak_count
, 'exercise' AS event_name
, 'MST' AS timezone
, min(min_ts) AS start_date
, max(max_ts) AS end_date
FROM (
SELECT *, the_day - row_number() OVER (ORDER BY the_day)::int AS streak
FROM (
SELECT (timestamp AT TIME ZONE 'UTC' AT TIME ZONE 'MST')::date AS the_day
, count(*) AS ct
, min(timestamp) AS min_ts
, max(timestamp) AS max_ts
FROM action_events
WHERE event_name = 'exercise'
GROUP BY 1
) sub1
) sub2
GROUP BY streak
ORDER BY end_date DESC
LIMIT 1;
sub1
Filter only the event_name
of interest right away - event_name = 'exercise'
in the example.
timestamp AT TIME ZONE 'UTC'
produces timestamptz
.
... AT TIME ZONE 'MST'
then returns the corresponding timestamp
at 'MST'. See:
Cast to ::date
(the_day
), and group by that.
So we get (at most) one row per day and carry along count, min, and max.
sub2
To identify streaks, simply subtract the row number (type integer
!) from the_day
(type date
!). See:
This produces the same (otherwise meaningless) day for consecutive days.
(sub1
and sub2
could be merged, but that'd be unwieldy.)
SELECT
Aggregate per streak, take sum of counts, min of min, and max of max.
Sort by min (or max, same result) descending and take the first row (LIMIT 1
).
Voilá.
In the result, start_date
and end_date
represent original timestamps (in UTC). You might want to show timestamps at 'MST' instead. You didn't say.
No minimum streak length has been defined, so it could be just a single row.
But since this processes all rows for the given event, it does not scale well for many rows.
It will be (much) faster to start with the latest row and loop until a gap is encountered. While no minimum streak length is required, we can't go wrong.
Demonstrating a PL/pgSQL function:
CREATE OR REPLACE FUNCTION f_latest_streak(
INOUT name text
, INOUT timezone text
, OUT steak_count int
, OUT start_date timestamp
, OUT end_date timestamp)
LANGUAGE plpgsql STRICT PARALLEL SAFE STABLE AS
$func$
DECLARE
_day_start timestamp;
_start_date timestamp;
_streak_step int;
BEGIN
-- get end & UTC time for start of latest day at given time zone
SELECT max(a.timestamp)
, date_trunc('day', max(a.timestamp) AT TIME ZONE 'UTC' AT TIME ZONE timezone)
AT TIME ZONE 'UTC' AT TIME ZONE 'UTC' -- sic!
INTO end_date, _day_start
FROM action_events a
WHERE event_name = name;
IF NOT FOUND THEN -- no rows at all
RETURN;
END IF;
-- get count for first day
SELECT count(*)::int, min(timestamp)
INTO steak_count, start_date
FROM action_events a
WHERE a.event_name = name -- careful with naming conflicts!
AND a.timestamp >= _day_start;
-- more days?
LOOP
SELECT count(*)::int, min(timestamp)
INTO _streak_step, _start_date
FROM action_events a
WHERE a.event_name = name -- careful with naming conflicts!
AND a.timestamp >= _day_start - interval '1 day'
AND a.timestamp < _day_start;
IF _streak_step = 0 THEN -- streak ends here
RETURN;
ELSE
steak_count := steak_count + _streak_step;
start_date := _start_date;
_day_start := _day_start - interval '1 day';
END IF;
END LOOP;
END
$func$;
Call:
SELECT * FROM f_latest_streak('exercise', 'MST');
A multicolumn index on (event_name, timestamp)
will make the function fast.
You should be comfortable with PL/pgSQL to play with this.