sqlrangecommon-table-expressionlaglead

SQL Building time ranges from sequential columns


I have a table with three columns.
DAYOFWEEK, STARTTIME, and ENDTIME. For many of the rows (sorted by STARTTIME) the STARTTIME is equal to the previous ENDTIME. I need to create ranges where those times are concurrent.

Sample Data:

DAYOFWEEK STARTTIME ENDTIME
1 61200 62100
1 62100 63000
1 63000 63900
1 63900 64800
1 64800 65700
1 68800 69900
2 51000 52000

Based on the sample data I'm seeking:

DAYOFWEEK STARTTIME ENDTIME
1 61200 65700
1 68800 69900
2 51000 52000

I'm sure it can be done somehow with CTE's and/or LAG/LEAD but I just can't work it out. Any brilliant ideas out there? I'm limited to using SQL 2014. Thanks!


Solution

  • Well, then the previous did not work, and was therefore less than quick.

    You want to partition by the day of the week, but, if a previous end time overlaps the current start time, you want to start the partition anew.

    This requires grouping/partitioning your data by a value that is not contained in your data: you need to sessionise your data. Except in Vertica, you do that by adding a counter in one OLAP query that is at 1 if a condition is true and at 0 otherwise (w_counter in my case), and nest this into an outer OLAP query that collects the running sum of that counter (w_session in my case). On the way there, I collect the "real" start time of the newly built partition/group. Finally I can group by day of week and newly obtained session_id.

    WITH                                                                                                                                                                                                      
    indata(dayofweek,starttime,endtime) AS (
              SELECT 1,61200,62100
    UNION ALL SELECT 1,62100,63000
    UNION ALL SELECT 1,63000,63900
    UNION ALL SELECT 1,63900,64800
    UNION ALL SELECT 1,64800,65700
    UNION ALL SELECT 1,68800,69900
    UNION ALL SELECT 2,51000,52000
    )
    ,
    w_counters AS (
    SELECT
        *
      , CASE WHEN starttime - LAG(endtime) OVER (PARTITION BY dayofweek ORDER BY starttime) > 0
         THEN starttime
         ELSE MIN(starttime) OVER (PARTITION BY dayofweek)
        END AS daystarttime
      , CASE WHEN starttime - LAG(endtime) OVER (PARTITION BY dayofweek ORDER BY starttime) > 0
         THEN 1
         ELSE 0
        END AS startoverlap
      FROM indata
      -- ctl SELECT * from w_counters ORDER BY 1
      -- ctl  dayofweek | starttime | endtime | daystarttime | ctlstarttime | startoverlap 
      -- ctl -----------+-----------+---------+--------------+--------------+--------------
      -- ctl          1 |     61200 |   62100 |        61200 |        61200 |            0
      -- ctl          1 |     62100 |   63000 |        61200 |        61200 |            0
      -- ctl          1 |     63000 |   63900 |        61200 |        61200 |            0
      -- ctl          1 |     63900 |   64800 |        61200 |        61200 |            0
      -- ctl          1 |     64800 |   65700 |        61200 |        61200 |            0
      -- ctl          1 |     68800 |   69900 |        68800 |        61200 |            1
      -- ctl          2 |     51000 |   52000 |        51000 |        51000 |            0
    )
    ,
    w_session AS (
      SELECT
        dayofweek
      , starttime
      , endtime
      , daystarttime
      , SUM(startoverlap) OVER(PARTITION BY dayofweek ORDER BY starttime) AS session_id
      FROM w_counters
      -- ctl ) select * from w_session ORDER BY 1
      -- ctl  dayofweek | starttime | endtime | daystarttime | session_id 
      -- ctl -----------+-----------+---------+--------------+------------
      -- ctl          1 |     61200 |   62100 |        61200 |          0
      -- ctl          1 |     62100 |   63000 |        61200 |          0
      -- ctl          1 |     63000 |   63900 |        61200 |          0
      -- ctl          1 |     63900 |   64800 |        61200 |          0
      -- ctl          1 |     64800 |   65700 |        61200 |          0
      -- ctl          1 |     68800 |   69900 |        68800 |          1
      -- ctl          2 |     51000 |   52000 |        51000 |          0
    )
    SELECT
      dayofweek
    , MIN(daystarttime) AS starttime
    , MAX(endtime)      AS endtime
    FROM w_session
    GROUP BY
      dayofweek
    , session_id
    ORDER BY
      dayofweek
    , session_id
    ;
    -- out  dayofweek | starttime | endtime 
    -- out -----------+-----------+---------
    -- out          1 |     61200 |   65700
    -- out          1 |     68800 |   69900
    -- out          2 |     51000 |   52000