time_start | time_end |
---|---|
'2024-01-01 12:30' | '2024-01-01 13:30' |
'2024-01-01 12:40' | '2024-01-01 13:40' |
'2024-01-01 12:45' | '2024-01-01 13:45' |
'2024-01-01 13:36' | '2024-01-01 14:36' |
'2024-01-01 13:50' | '2024-01-01 14:50' |
'2024-01-01 14:30' | '2024-01-01 15:30' |
'2024-01-01 15:35' | '2024-01-01 16:35' |
'2024-01-01 16:30' | '2024-01-01 17:30' |
'2024-01-01 17:30' | '2024-01-01 18:30' |
Every interval is an event-period. I need the first event to be shown and then the second event should be the next event which has a interval that doesn't overlap with the interval of the first event. Then the third event should be the next event that doesn't overlap with the second event and so on.
Expected output:
time_start | time_end |
---|---|
'2024-01-01 12:30' | '2024-01-01 13:30' |
'2024-01-01 13:36' | '2024-01-01 14:36' |
'2024-01-01 15:35' | '2024-01-01 16:35' |
'2024-01-01 17:30' | '2024-01-01 18:30' |
I have tried something like this, but it didnt give me the expected output:
WITH Intervalle AS (
SELECT
start_time,
end_time,
LAG(end_time) OVER (ORDER BY start_time) AS previous_end_time
FROM
deine_tabelle
)
SELECT
start_time,
end_time
FROM
Intervalle
WHERE
previous_end_time IS NULL
OR start_time >= previous_end_time + INTERVAL 1 HOUR
ORDER BY
start_time;
I am going to assume MySQL based on the INTERVAL 1 HOUR
expression in your posted query.
This is a variation of the gaps-and-islands problem. However, unlike typical gaps-and-islands cases, I do not believe you can use LAG()
and LEAD()
functions to directly identify the "gaps. This is because the status of each row potentially depends on the results for every prior row, not just the adjacent rows. There are other approaches...
You can use a recursive CTE to select the first row and then repeatedly select the next eligible following row - both using subqueries that select the next "best" row using a combination of ORDER BY
and LIMIT 1
.
WITH RECURSIVE SelectedIntervals AS (
SELECT I2.*
FROM (
SELECT I.*
FROM Intervalle I
ORDER BY I.time_start
LIMIT 1
) I2
UNION ALL
SELECT I2.*
FROM SelectedIntervals S
, LATERAL (
SELECT I.time_start, I.time_end
FROM Intervalle I
WHERE I.time_start >= S.time_start + INTERVAL 1 HOUR
ORDER BY I.time_start
LIMIT 1
) I2
)
SELECT *
FROM SelectedIntervals
Results:
time_start | time_end |
---|---|
2024-01-01 12:30:00 | 2024-01-01 13:30:00 |
2024-01-01 13:36:00 | 2024-01-01 14:36:00 |
2024-01-01 15:35:00 | 2024-01-01 16:35:00 |
2024-01-01 17:30:00 | 2024-01-01 18:30:00 |
If you wish to use a gaps and islands approach, you can process all of the rows in sequence while tracking the time_start
of each "island". A comparison of each time_start
with the current island start can be used to identify "gaps". A running sum of the gap flags can be used to number the islands.
WITH RECURSIVE TaggedIntervals AS (
SELECT
I2.*,
1 AS IsGap,
1 AS Island,
I2.time_start AS ref_time
FROM (
SELECT I.*
FROM Intervalle I
ORDER BY I.time_start
LIMIT 1
) I2
UNION ALL
SELECT
I2.*,
T.Island + I2.IsGap AS Island,
CASE WHEN I2.IsGap = 1 THEN I2.time_start ELSE T.ref_time END AS ref_time
FROM TaggedIntervals T
, LATERAL (
SELECT
I.*,
CASE WHEN I.time_start >= T.ref_time + INTERVAL 1 HOUR
THEN 1 ELSE 0 END AS IsGap
FROM Intervalle I
WHERE I.time_start > T.time_start
ORDER BY I.time_start
LIMIT 1
) I2
)
SELECT *
FROM TaggedIntervals
This assumes that there are no duplicate times. If you need to handle such cases, you may need to first number the rows using ROW_NUMBER()
and then use that row number to sequence the rows.
Results:
time_start | time_end | IsGap | Island | ref_time |
---|---|---|---|---|
2024-01-01 12:30:00 | 2024-01-01 13:30:00 | 1 | 1 | 2024-01-01 12:30:00 |
2024-01-01 12:40:00 | 2024-01-01 13:40:00 | 0 | 1 | 2024-01-01 12:30:00 |
2024-01-01 12:45:00 | 2024-01-01 13:45:00 | 0 | 1 | 2024-01-01 12:30:00 |
2024-01-01 13:36:00 | 2024-01-01 14:36:00 | 1 | 2 | 2024-01-01 13:36:00 |
2024-01-01 13:50:00 | 2024-01-01 14:50:00 | 0 | 2 | 2024-01-01 13:36:00 |
2024-01-01 14:30:00 | 2024-01-01 15:30:00 | 0 | 2 | 2024-01-01 13:36:00 |
2024-01-01 15:35:00 | 2024-01-01 16:35:00 | 1 | 3 | 2024-01-01 15:35:00 |
2024-01-01 16:30:00 | 2024-01-01 17:30:00 | 0 | 3 | 2024-01-01 15:35:00 |
2024-01-01 17:30:00 | 2024-01-01 18:30:00 | 1 | 4 | 2024-01-01 17:30:00 |
At this point, you can add logic to group by Island
and apply whatever aggregation functions needed to achieve your desired results.
See this db<>fiddle for a demo.