sqlmysqlintervalsgaps-and-islands

Select rows without overlap / dynamic 1 hour intervals


time_start time_end
'2024-01-01 12:30' '2024-01-01 13:30'
'2024-01-01 12:40' '2024-01-01 13:40'
'2024-01-01 12:45' '2024-01-01 13:45'
'2024-01-01 13:36' '2024-01-01 14:36'
'2024-01-01 13:50' '2024-01-01 14:50'
'2024-01-01 14:30' '2024-01-01 15:30'
'2024-01-01 15:35' '2024-01-01 16:35'
'2024-01-01 16:30' '2024-01-01 17:30'
'2024-01-01 17:30' '2024-01-01 18:30'

Every interval is an event-period. I need the first event to be shown and then the second event should be the next event which has a interval that doesn't overlap with the interval of the first event. Then the third event should be the next event that doesn't overlap with the second event and so on.

Expected output:

time_start time_end
'2024-01-01 12:30' '2024-01-01 13:30'
'2024-01-01 13:36' '2024-01-01 14:36'
'2024-01-01 15:35' '2024-01-01 16:35'
'2024-01-01 17:30' '2024-01-01 18:30'

I have tried something like this, but it didnt give me the expected output:

WITH Intervalle AS (
    SELECT
        start_time,
        end_time,
        LAG(end_time) OVER (ORDER BY start_time) AS previous_end_time
    FROM
        deine_tabelle
)
SELECT
    start_time,
    end_time
FROM
    Intervalle
WHERE
    previous_end_time IS NULL
    OR start_time >= previous_end_time + INTERVAL 1 HOUR
ORDER BY
    start_time;

Solution

  • I am going to assume MySQL based on the INTERVAL 1 HOUR expression in your posted query.

    This is a variation of the gaps-and-islands problem. However, unlike typical gaps-and-islands cases, I do not believe you can use LAG() and LEAD() functions to directly identify the "gaps. This is because the status of each row potentially depends on the results for every prior row, not just the adjacent rows. There are other approaches...

    You can use a recursive CTE to select the first row and then repeatedly select the next eligible following row - both using subqueries that select the next "best" row using a combination of ORDER BY and LIMIT 1.

    WITH RECURSIVE SelectedIntervals AS (
        SELECT I2.*
        FROM (
            SELECT I.*
            FROM Intervalle I
            ORDER BY I.time_start
            LIMIT 1
        ) I2
    
        UNION ALL
    
        SELECT I2.*
        FROM SelectedIntervals S
        , LATERAL (
            SELECT I.time_start, I.time_end
            FROM Intervalle I
            WHERE I.time_start >= S.time_start + INTERVAL 1 HOUR
            ORDER BY I.time_start
            LIMIT 1
        ) I2
    )
    SELECT *
    FROM SelectedIntervals
    

    Results:

    time_start time_end
    2024-01-01 12:30:00 2024-01-01 13:30:00
    2024-01-01 13:36:00 2024-01-01 14:36:00
    2024-01-01 15:35:00 2024-01-01 16:35:00
    2024-01-01 17:30:00 2024-01-01 18:30:00

    If you wish to use a gaps and islands approach, you can process all of the rows in sequence while tracking the time_start of each "island". A comparison of each time_start with the current island start can be used to identify "gaps". A running sum of the gap flags can be used to number the islands.

    WITH RECURSIVE TaggedIntervals AS (
        SELECT
            I2.*,
            1 AS IsGap,
            1 AS Island,
            I2.time_start AS ref_time
        FROM (
            SELECT I.*
            FROM Intervalle I
            ORDER BY I.time_start
            LIMIT 1
        ) I2
    
        UNION ALL
    
        SELECT
            I2.*,
            T.Island + I2.IsGap AS Island,
            CASE WHEN I2.IsGap = 1 THEN I2.time_start ELSE T.ref_time END AS ref_time
        FROM TaggedIntervals T
        , LATERAL (
            SELECT
                I.*,
                CASE WHEN I.time_start >= T.ref_time + INTERVAL 1 HOUR
                     THEN 1 ELSE 0 END AS IsGap
            FROM Intervalle I
            WHERE I.time_start > T.time_start
            ORDER BY I.time_start
            LIMIT 1
        ) I2
    )
    SELECT *
    FROM TaggedIntervals
    

    This assumes that there are no duplicate times. If you need to handle such cases, you may need to first number the rows using ROW_NUMBER() and then use that row number to sequence the rows.

    Results:

    time_start time_end IsGap Island ref_time
    2024-01-01 12:30:00 2024-01-01 13:30:00 1 1 2024-01-01 12:30:00
    2024-01-01 12:40:00 2024-01-01 13:40:00 0 1 2024-01-01 12:30:00
    2024-01-01 12:45:00 2024-01-01 13:45:00 0 1 2024-01-01 12:30:00
    2024-01-01 13:36:00 2024-01-01 14:36:00 1 2 2024-01-01 13:36:00
    2024-01-01 13:50:00 2024-01-01 14:50:00 0 2 2024-01-01 13:36:00
    2024-01-01 14:30:00 2024-01-01 15:30:00 0 2 2024-01-01 13:36:00
    2024-01-01 15:35:00 2024-01-01 16:35:00 1 3 2024-01-01 15:35:00
    2024-01-01 16:30:00 2024-01-01 17:30:00 0 3 2024-01-01 15:35:00
    2024-01-01 17:30:00 2024-01-01 18:30:00 1 4 2024-01-01 17:30:00

    At this point, you can add logic to group by Island and apply whatever aggregation functions needed to achieve your desired results.

    See this db<>fiddle for a demo.