sqlsql-serverwindow-functions

Get sum for each 5 minute time interval


Problem Description

I have a table (#tmstmp) with two columns dt (DATETIME) and payload (INT). Eventually, I want to sum payload for each 5 minute interval there is.

Code

Setup

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DROP TABLE IF EXISTS #tmstmp
                     , #numbers;
CREATE TABLE #tmstmp (
  dt DATETIME PRIMARY KEY
  , payload INT NOT NULL
);

CREATE TABLE #numbers (
  n INT PRIMARY KEY
);
WITH numbers (n) AS (
  SELECT 0 AS n
  UNION ALL
  SELECT n + 1 AS n
    FROM numbers
   WHERE n < 100
)
INSERT
  INTO #numbers
SELECT n
  FROM numbers;

WITH rnd (mins, secs) AS (
  SELECT n2.n AS mins
         , CAST(ABS(CHECKSUM(NEWID())) % 60 AS INT) AS mins
   FROM #numbers AS n1
        , #numbers as n2
  WHERE n1.n < 5
    AND n2.n < 15
), tmstmp (dt) AS (
  SELECT DATEADD(SECOND, secs, DATEADD(MINUTE, mins, @start)) AS dt
    FROM rnd
)
INSERT
  INTO #tmstmp
SELECT DISTINCT dt
       , -1 AS payload
  FROM tmstmp
 ORDER BY dt;

UPDATE #tmstmp
   SET payload = CAST(ABS(CHECKSUM(NEWID())) % 10 AS INT);
GO

Non overlapping timeslots are easy

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;

WITH agg (slot, sum_payload) AS (
  SELECT DATEDIFF(MINUTE, @start, dt) / @slotDuration AS slot
         , SUM(payload) AS sum_payload
    FROM #tmstmp
   GROUP BY DATEDIFF(MINUTE, @start, dt) / @slotDuration
)
SELECT DATEADD(MINUTE, slot * @slotDuration, @start) AS [from]
       , DATEADD(MINUTE, (slot + 1) * @slotDuration, @start) AS [to]
       , sum_payload
  FROM agg;
from to sum_payload
2024-01-01 12:00:00 2024-01-01 12:05:00 124
2024-01-01 12:05:00 2024-01-01 12:10:00 106
2024-01-01 12:10:00 2024-01-01 12:15:00 95

Ultimate Goal: get running timeslots

I want, however, to have an entry for each interval in the range, that is from 12:00-12:05, 12:01-12:06, 12:02-12:07, etc. until the last timeslot.

I can construct the limits in the whole range before and use that in a JOIN like this:

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
DECLARE @intervals INT = (SELECT DATEDIFF(MINUTE, @start, MAX(dt)) FROM #tmstmp);

WITH ranges ([from], [to], slot) AS (
  SELECT DATEADD(MINUTE, n, @start) AS [from]
         , DATEADD(MINUTE, n + @slotDuration, @start) AS [to]
         , n AS slot
    FROM #numbers
   WHERE n <= @intervals
), tm_mult (slot, [from], [to], dt, payload) AS (
  SELECT slot
         , [from]
         , [to]
         , dt
         , payload
    FROM #tmstmp
   INNER JOIN ranges
      ON [from] <= dt
     AND dt < [to]
)
SELECT MIN([from]) AS [from]
       , MAX([to]) AS [to]
       , SUM(payload) AS sum_payload
  FROM tm_mult
 GROUP BY slot
 ORDER BY slot;
from to sum_payload
2024-01-01 12:00:00 2024-01-01 12:05:00 124
2024-01-01 12:01:00 2024-01-01 12:06:00 120
2024-01-01 12:02:00 2024-01-01 12:07:00 125
... ... ...
2024-01-01 12:14:00 2024-01-01 12:19:00 19

While this works in this toy example, I have hundreds of thousands of timestamps in my real data and worst of all, I have little impact on the indices.

My gut feeling tells me that I will create quite some duplication with my inequality JOIN, and I was wondering whether this is anyway the canonical way of doing it or whether there is a more SQL-onic way of doing it? (like the Pythonistas like to call certain code Pythonic if it uses the language inherent concepts rather than trying to solve it with general tools).


Solution

  • Window functions in SQL (WINDOW - microsoft.com / OVER - microsoft.com) are a great asset to add to your SQL toolbelt. Also particularly canonical; windows have been in since SQL Server 2005.

    Below is an example:

    SELECT
        [From],
        DATEADD(MINUTE, 1, [To]) [To],
        payload
    FROM (
        SELECT
            dt,
            MIN(dt) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) [From],
            dt [To],
            SUM(payload) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) payload
        FROM (
            SELECT
                DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0) dt,
                SUM(payload) payload
            FROM #tmstmp
            GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
        ) q
    ) q
    WHERE DATEDIFF(MINUTE, [From], [To]) > 3
    

    I'd like to draw attention to both the 4 PRECEDING and DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0). As the latter practically floors the datetime to the minute, 2024-01-01 12:04:00.000 is inclusive up to 2024-01-01 12:04:59.999, but it doesn't include 2024-01-01 12:05:00.000. Hopefully that's the functionality you are looking for.

    Here is a fiddle.