I have a table (#tmstmp
) with two columns dt
(DATETIME
) and payload
(INT
). Eventually, I want to sum payload
for each 5 minute interval there is.
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DROP TABLE IF EXISTS #tmstmp
, #numbers;
CREATE TABLE #tmstmp (
dt DATETIME PRIMARY KEY
, payload INT NOT NULL
);
CREATE TABLE #numbers (
n INT PRIMARY KEY
);
WITH numbers (n) AS (
SELECT 0 AS n
UNION ALL
SELECT n + 1 AS n
FROM numbers
WHERE n < 100
)
INSERT
INTO #numbers
SELECT n
FROM numbers;
WITH rnd (mins, secs) AS (
SELECT n2.n AS mins
, CAST(ABS(CHECKSUM(NEWID())) % 60 AS INT) AS mins
FROM #numbers AS n1
, #numbers as n2
WHERE n1.n < 5
AND n2.n < 15
), tmstmp (dt) AS (
SELECT DATEADD(SECOND, secs, DATEADD(MINUTE, mins, @start)) AS dt
FROM rnd
)
INSERT
INTO #tmstmp
SELECT DISTINCT dt
, -1 AS payload
FROM tmstmp
ORDER BY dt;
UPDATE #tmstmp
SET payload = CAST(ABS(CHECKSUM(NEWID())) % 10 AS INT);
GO
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
WITH agg (slot, sum_payload) AS (
SELECT DATEDIFF(MINUTE, @start, dt) / @slotDuration AS slot
, SUM(payload) AS sum_payload
FROM #tmstmp
GROUP BY DATEDIFF(MINUTE, @start, dt) / @slotDuration
)
SELECT DATEADD(MINUTE, slot * @slotDuration, @start) AS [from]
, DATEADD(MINUTE, (slot + 1) * @slotDuration, @start) AS [to]
, sum_payload
FROM agg;
from | to | sum_payload |
---|---|---|
2024-01-01 12:00:00 | 2024-01-01 12:05:00 | 124 |
2024-01-01 12:05:00 | 2024-01-01 12:10:00 | 106 |
2024-01-01 12:10:00 | 2024-01-01 12:15:00 | 95 |
I want, however, to have an entry for each interval in the range, that is from 12:00-12:05
, 12:01-12:06
, 12:02-12:07
, etc. until the last timeslot.
I can construct the limits in the whole range before and use that in a JOIN
like this:
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
DECLARE @intervals INT = (SELECT DATEDIFF(MINUTE, @start, MAX(dt)) FROM #tmstmp);
WITH ranges ([from], [to], slot) AS (
SELECT DATEADD(MINUTE, n, @start) AS [from]
, DATEADD(MINUTE, n + @slotDuration, @start) AS [to]
, n AS slot
FROM #numbers
WHERE n <= @intervals
), tm_mult (slot, [from], [to], dt, payload) AS (
SELECT slot
, [from]
, [to]
, dt
, payload
FROM #tmstmp
INNER JOIN ranges
ON [from] <= dt
AND dt < [to]
)
SELECT MIN([from]) AS [from]
, MAX([to]) AS [to]
, SUM(payload) AS sum_payload
FROM tm_mult
GROUP BY slot
ORDER BY slot;
from | to | sum_payload |
---|---|---|
2024-01-01 12:00:00 | 2024-01-01 12:05:00 | 124 |
2024-01-01 12:01:00 | 2024-01-01 12:06:00 | 120 |
2024-01-01 12:02:00 | 2024-01-01 12:07:00 | 125 |
... | ... | ... |
2024-01-01 12:14:00 | 2024-01-01 12:19:00 | 19 |
While this works in this toy example, I have hundreds of thousands of timestamps in my real data and worst of all, I have little impact on the indices.
My gut feeling tells me that I will create quite some duplication with my inequality JOIN
, and I was wondering whether this is anyway the canonical way of doing it or whether there is a more SQL-onic way of doing it? (like the Pythonistas like to call certain code Pythonic if it uses the language inherent concepts rather than trying to solve it with general tools).
Window functions in SQL (WINDOW - microsoft.com / OVER - microsoft.com) are a great asset to add to your SQL toolbelt. Also particularly canonical; windows have been in since SQL Server 2005.
Below is an example:
SELECT
[From],
DATEADD(MINUTE, 1, [To]) [To],
payload
FROM (
SELECT
dt,
MIN(dt) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) [From],
dt [To],
SUM(payload) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) payload
FROM (
SELECT
DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0) dt,
SUM(payload) payload
FROM #tmstmp
GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
) q
) q
WHERE DATEDIFF(MINUTE, [From], [To]) > 3
I'd like to draw attention to both the 4 PRECEDING
and DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
. As the latter practically floors the datetime to the minute, 2024-01-01 12:04:00.000
is inclusive up to 2024-01-01 12:04:59.999
, but it doesn't include 2024-01-01 12:05:00.000
. Hopefully that's the functionality you are looking for.
Here is a fiddle.