sqlpostgresqlcountsubquery

Rolling count of rows withing time interval


For an analysis I need to aggregate the rows of a single table depending on their creation time. Basically, I want to know the count of orders that have been created within a certain period of time before the current order. Can't seem to find the solution to this.

Table structure:

order_id time_created
1 00:00
2 00:01
3 00:03
4 00:05
5 00:10

Expected result:

order_id count within 3 seconds
1 1
2 2
3 3
4 2
5 1

Solution

  • Postgres 11 or newer

    Simpler now:

    SELECT t.*  -- or whatever you need
         , count(*) OVER (ORDER BY t.time_created
                          RANGE interval '3 sec' PRECEDING) AS count_within_3_sec
    FROM   tbl t;
    

    fiddle

    See:

    Postgres 10 or older

    Sounds like an application for window functions. But window frames can only be based on row counts, not on actual column values.

    A simple query with LEFT JOIN can do the job:

    SELECT t0.order_id
         , count(t1.time_created) AS count_within_3_sec
    FROM   tbl      t0
    LEFT   JOIN tbl t1 ON t1.time_created BETWEEN t0.time_created - interval '3 sec'
                                              AND t0.time_created
    GROUP  BY 1
    ORDER  BY 1;
    

    db<>fiddle here

    Does not work with time like in your minimal demo, as that does not wrap around. I suppose it's reasonable to assume timestamp or timestamptz.

    Since you include each row itself in the count, an INNER JOIN would work, too. (LEFT JOIN is still more reliable in the face of possible NULL values.)

    Or use a LATERAL subquery and you don't need to aggregate on the outer query level:

    SELECT t0.order_id
         , t1.count_within_3_sec
    FROM   tbl t0
    LEFT   JOIN LATERAL (
       SELECT count(*) AS count_within_3_sec
       FROM   tbl t1
       WHERE  t1.time_created BETWEEN t0.time_created - interval '3 sec'
                                  AND t0.time_created
       ) t1 ON true
    ORDER  BY 1;
    

    Related:

    For big tables and many rows in the time frame, a procedural solution that walks through the table once will perform better. Like: