I have a table like below:
C_ID | O_ID | T_START | T_STOP |
---|---|---|---|
1 | 1 | 5/7/25 12:06 | 5/8/25 11:09 |
1 | 2 | 5/8/25 11:24 | 5/10/25 11:24 |
1 | 3 | 5/10/25 15:41 | 5/11/25 12:38 |
1 | 4 | 5/14/25 11:24 | 5/16/25 07:20 |
1 | 5 | 5/16/25 08:50 | 5/18/25 08:50 |
1 | 6 | 5/18/25 09:24 | 5/20/25 09:24 |
2 | 1 | 5/10/25 11:42 | 5/12/25 11:42 |
2 | 2 | 5/12/25 11:44 | 5/14/25 11:43 |
2 | 3 | 5/18/25 00:02 | 5/19/25 02:54 |
I would like to be able to combine some of the rows (including aggregating O_ID's and replacing T_STOP's with the MAX) based on the parameter that given C_ID is the same and the T_STOP of one row is within 2 hours of the next T_START. So, given the above table, the result should be:
C_ID | O_ID | T_START | T_STOP |
---|---|---|---|
1 | 1, 2 | 5/7/25 12:06 | 5/10/25 11:24 |
1 | 3 | 5/10/25 15:41 | 5/11/25 12:38 |
1 | 4, 5, 6 | 5/14/25 11:24 | 5/20/25 09:24 |
2 | 1, 2 | 5/10/25 11:42 | 5/14/25 11:43 |
2 | 3 | 5/18/25 00:02 | 5/19/25 02:54 |
I figure there has to be some recursion involved, but I can't quite wrap my head around it.
From Oracle 12, you can use MATCH_RECOGNIZE
for row-by-row pattern matching to find the groups of rows within 2-hours of each other; once you have found the groups you can aggregate:
SELECT c_id,
LISTAGG(o_id, ',') WITHIN GROUP (ORDER BY o_id) AS o_ids,
MIN(t_start) AS t_start,
MAX(t_stop) AS t_stop
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY c_id
ORDER BY o_id
MEASURES
MATCH_NUMBER() AS mn
ALL ROWS PER MATCH
PATTERN (first_row within_2_hours*)
DEFINE
within_2_hours AS t_start <= PREV(t_stop) + INTERVAL '2' HOUR
)
GROUP BY c_id, mn
In Oracle 11, you can use analytic functions to find the groups:
SELECT c_id,
LISTAGG(o_id, ',') WITHIN GROUP (ORDER BY o_id) AS o_ids,
MIN(t_start) AS t_start,
MAX(t_stop) AS t_stop
FROM (
SELECT c_id,
o_id,
t_start,
t_stop,
SUM(has_changed_group) OVER (PARTITION BY c_id ORDER BY o_id) AS mn
FROM (
SELECT t.*,
CASE
WHEN t_start
<= LAG(t_stop) OVER (PARTITION BY c_id ORDER BY o_id) + INTERVAL '2' HOUR
THEN 0
ELSE 1
END AS has_changed_group
FROM table_name t
)
)
GROUP BY c_id, mn
Which, for the sample data:
CREATE TABLE table_name (C_ID, O_ID, T_START, T_STOP) AS
SELECT 1, 1, DATE '2025-05-07' + INTERVAL '12:06' HOUR TO MINUTE, DATE '2025-05-08' + INTERVAL '11:09' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, 2, DATE '2025-05-08' + INTERVAL '11:24' HOUR TO MINUTE, DATE '2025-05-10' + INTERVAL '11:24' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, 3, DATE '2025-05-10' + INTERVAL '15:41' HOUR TO MINUTE, DATE '2025-05-11' + INTERVAL '12:38' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, 4, DATE '2025-05-14' + INTERVAL '11:24' HOUR TO MINUTE, DATE '2025-05-16' + INTERVAL '07:20' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, 5, DATE '2025-05-16' + INTERVAL '08:50' HOUR TO MINUTE, DATE '2025-05-18' + INTERVAL '08:50' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, 6, DATE '2025-05-18' + INTERVAL '09:24' HOUR TO MINUTE, DATE '2025-05-20' + INTERVAL '09:24' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, 1, DATE '2025-05-10' + INTERVAL '11:42' HOUR TO MINUTE, DATE '2025-05-12' + INTERVAL '11:42' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, 2, DATE '2025-05-12' + INTERVAL '11:44' HOUR TO MINUTE, DATE '2025-05-14' + INTERVAL '11:43' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, 3, DATE '2025-05-18' + INTERVAL '00:02' HOUR TO MINUTE, DATE '2025-05-19' + INTERVAL '02:54' HOUR TO MINUTE FROM DUAL;
Both output:
C_ID | O_IDS | T_START | T_STOP |
---|---|---|---|
1 | 1,2 | 2025-05-07 12:06:00 | 2025-05-10 11:24:00 |
1 | 3 | 2025-05-10 15:41:00 | 2025-05-11 12:38:00 |
1 | 4,5,6 | 2025-05-14 11:24:00 | 2025-05-20 09:24:00 |
2 | 1,2 | 2025-05-10 11:42:00 | 2025-05-14 11:43:00 |
2 | 3 | 2025-05-18 00:02:00 | 2025-05-19 02:54:00 |