I have a table with three columns.
DAYOFWEEK, STARTTIME, and ENDTIME.
For many of the rows (sorted by STARTTIME) the STARTTIME is equal to the previous ENDTIME. I need to create ranges where those times are concurrent.
Sample Data:
DAYOFWEEK | STARTTIME | ENDTIME |
---|---|---|
1 | 61200 | 62100 |
1 | 62100 | 63000 |
1 | 63000 | 63900 |
1 | 63900 | 64800 |
1 | 64800 | 65700 |
1 | 68800 | 69900 |
2 | 51000 | 52000 |
Based on the sample data I'm seeking:
DAYOFWEEK | STARTTIME | ENDTIME |
---|---|---|
1 | 61200 | 65700 |
1 | 68800 | 69900 |
2 | 51000 | 52000 |
I'm sure it can be done somehow with CTE's and/or LAG/LEAD but I just can't work it out. Any brilliant ideas out there? I'm limited to using SQL 2014. Thanks!
Well, then the previous did not work, and was therefore less than quick.
You want to partition by the day of the week, but, if a previous end time overlaps the current start time, you want to start the partition anew.
This requires grouping/partitioning your data by a value that is not contained in your data: you need to sessionise your data. Except in Vertica, you do that by adding a counter in one OLAP query that is at 1 if a condition is true and at 0 otherwise (w_counter
in my case), and nest this into an outer OLAP query that collects the running sum of that counter (w_session
in my case). On the way there, I collect the "real" start time of the newly built partition/group. Finally I can group by day of week and newly obtained session_id
.
WITH
indata(dayofweek,starttime,endtime) AS (
SELECT 1,61200,62100
UNION ALL SELECT 1,62100,63000
UNION ALL SELECT 1,63000,63900
UNION ALL SELECT 1,63900,64800
UNION ALL SELECT 1,64800,65700
UNION ALL SELECT 1,68800,69900
UNION ALL SELECT 2,51000,52000
)
,
w_counters AS (
SELECT
*
, CASE WHEN starttime - LAG(endtime) OVER (PARTITION BY dayofweek ORDER BY starttime) > 0
THEN starttime
ELSE MIN(starttime) OVER (PARTITION BY dayofweek)
END AS daystarttime
, CASE WHEN starttime - LAG(endtime) OVER (PARTITION BY dayofweek ORDER BY starttime) > 0
THEN 1
ELSE 0
END AS startoverlap
FROM indata
-- ctl SELECT * from w_counters ORDER BY 1
-- ctl dayofweek | starttime | endtime | daystarttime | ctlstarttime | startoverlap
-- ctl -----------+-----------+---------+--------------+--------------+--------------
-- ctl 1 | 61200 | 62100 | 61200 | 61200 | 0
-- ctl 1 | 62100 | 63000 | 61200 | 61200 | 0
-- ctl 1 | 63000 | 63900 | 61200 | 61200 | 0
-- ctl 1 | 63900 | 64800 | 61200 | 61200 | 0
-- ctl 1 | 64800 | 65700 | 61200 | 61200 | 0
-- ctl 1 | 68800 | 69900 | 68800 | 61200 | 1
-- ctl 2 | 51000 | 52000 | 51000 | 51000 | 0
)
,
w_session AS (
SELECT
dayofweek
, starttime
, endtime
, daystarttime
, SUM(startoverlap) OVER(PARTITION BY dayofweek ORDER BY starttime) AS session_id
FROM w_counters
-- ctl ) select * from w_session ORDER BY 1
-- ctl dayofweek | starttime | endtime | daystarttime | session_id
-- ctl -----------+-----------+---------+--------------+------------
-- ctl 1 | 61200 | 62100 | 61200 | 0
-- ctl 1 | 62100 | 63000 | 61200 | 0
-- ctl 1 | 63000 | 63900 | 61200 | 0
-- ctl 1 | 63900 | 64800 | 61200 | 0
-- ctl 1 | 64800 | 65700 | 61200 | 0
-- ctl 1 | 68800 | 69900 | 68800 | 1
-- ctl 2 | 51000 | 52000 | 51000 | 0
)
SELECT
dayofweek
, MIN(daystarttime) AS starttime
, MAX(endtime) AS endtime
FROM w_session
GROUP BY
dayofweek
, session_id
ORDER BY
dayofweek
, session_id
;
-- out dayofweek | starttime | endtime
-- out -----------+-----------+---------
-- out 1 | 61200 | 65700
-- out 1 | 68800 | 69900
-- out 2 | 51000 | 52000