sqlpostgresqlgroup-byaveragegenerate-series

How to get average values for time intervals in Postgres


I'm using PostgreSQL 9.6. I have a table like this:

mac   sn         loc   time     date      vin1    vin2    vin3
1a34 4as11111111 aaaa  7:06:18  1/1/2018  447.42  472.32  682.59
1a34 4as11111111 aaaa  7:06:43  1/1/2018  455.97  476.25  682.59
1a34 4as11111111 aaaa  7:07:35  1/1/2018  470.88  484.2   682.5

I need to calculate the average of the vin1, vin2, vin3 within time intervals of 300 sec (5 min). For example, starting from the first time (7:06:18 - 7:11:18), for the dates in range. I can select the data I need with this query:

select * from table
where sn='4as11111111' and date between '2018-01-01' and '2018-01-02';

But I don't know how to group it by 300 sec time intervals and calculate average for vin1, vin2, vin3 columns for those 5 min interval, to get something like this:

mac  sn          loc     time     date      vin1_av  vin2_av  vin3_av
1a34 4as11111111 aaaa   7:06:18  1/1/2018  450.0    480.32   600.59
1a34 4as11111111 aaaa   7:11:18  1/1/2018  460.0    490.25   782.59
1a34 4as11111111 aaaa   7:16:18  1/1/2018  470.88   500.2    600.5

Any help would be greatly appreciated.


Solution

  • DB design

    While you can work with separate date and time columns, there is really no advantage over a single timestamp column. I would adapt:

    ALTER TABLE tbl ADD column ts timestamp;
    UPDATE tbl SET ts = date + time;  -- assuming actual date and time types
    ALTER TABLE tbl DROP column date, DROP column time;
    

    If date and time are not actual date and time data types, use to_timestamp(). Related:

    Query

    Then the query is a bit simpler:

    SELECT *
    FROM  (
       SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
       FROM   tbl
       WHERE  sn = '4as11111111'
       AND    ts >= '2018-01-01'
       AND    ts <  '2018-01-02'
       GROUP  BY 1
       ) grid
    CROSS  JOIN LATERAL (
       SELECT round(avg(vin1), 2) AS vin1_av
            , round(avg(vin2), 2) AS vin2_av
            , round(avg(vin3), 2) AS vin3_av
       FROM   tbl
       WHERE  sn =  grid.sn
       AND    ts >= grid.ts
       AND    ts <  grid.ts + interval '5 min'
       ) avg;
    

    db<>fiddle here

    Generate a grid of start times in the first subquery grid, running from the first to the last qualifying row in the given time frame.

    Join to rows that fall in each partition with a LATERAL join and immediately aggregate averages in the subquery avg. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL in this case.

    The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.

    Index

    At least have this multicolumn index:

    CRATE INDEX foo_idx ON tbl (sn, ts);
    

    Or on (sn, ts, vin1, vin2, vin3) to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.

    Closely related:

    Based on your original table

    As requested and clarified in the comment, and later updated again in the question to include the columns mac and loc. I assume you want separate averages per (mac, loc).

    date and time are still separate columns, vin* columns are type float, and exclude time slots without rows:

    The updated query also moves the set-returning function generate_series() to the FROM list, which is cleaner before Postgres 10:

    SELECT t.mac, sn.sn, t.loc, ts.ts::time AS time, ts.ts::date AS date
         , t.vin1_av, t.vin2_av, t.vin3_av
    FROM  (SELECT text '4as11111111') sn(sn)  -- provide sn here once
    CROSS  JOIN LATERAL (
       SELECT min(date+time) AS min_ts, max(date+time) AS max_ts
       FROM   tbl
       WHERE  sn = sn.sn
       AND    date+time >= '2018-01-01 0:0'   -- provide time frame here
       AND    date+time <  '2018-01-02 0:0'
       ) grid
    CROSS  JOIN LATERAL generate_series(min_ts, max_ts, interval '5 min') ts(ts)
    CROSS  JOIN LATERAL (
       SELECT mac, loc
            , round(avg(vin1)::numeric, 2) AS vin1_av  -- cast to numeric for round()
            , round(avg(vin2)::numeric, 2) AS vin2_av  -- but rounding is optional
            , round(avg(vin3)::numeric, 2) AS vin3_av
       FROM   tbl
       WHERE  sn = sn.sn
       AND    date+time >= ts.ts
       AND    date+time <  ts.ts + interval '5 min'
       GROUP  BY mac, loc
       HAVING count(*) > 0  -- exclude empty slots
       ) t;
    

    Create a multicolumn expression index to support this:

    CRATE INDEX bar_idx ON tbl (sn, (date+time));
    

    db<>fiddle here

    But I would much rather use timestamp all along.