I have a list of time ranges represented as tuples of datetime objects:
time_ranges = [
(datetime(2023, 10, 26, 10, 0, 0), datetime(2023, 10, 26, 11, 0, 0)),
(datetime(2023, 10, 26, 10, 30, 0), datetime(2023, 10, 26, 12, 0, 0)),
(datetime(2023, 10, 26, 13, 0, 0), datetime(2023, 10, 26, 14, 0, 0)),
(datetime(2023, 10, 26, 13, 30, 0), datetime(2023, 10, 26, 15, 0, 0)),
(datetime(2023, 10, 26, 15, 30, 0), datetime(2023, 10, 26, 16, 0, 0)),
]
I need to efficiently find all overlapping time ranges within this list. Two time ranges overlap if they share any common time. The output should be a list of tuples, where each tuple contains the indices of the overlapping ranges. For the example above, the desired output would be:
overlapping_ranges = [
(0, 1),
(2, 3),
]
I've tried a naive approach using nested loops to compare each range with every other range, but this has O(n^2) complexity and is too slow for my large dataset.
It would be helpful if the solution could also handle cases where a time range completely encompasses another time range (e.g., (10:00, 14:00) overlaps with (11:00, 13:00)).
We are given a finite set of ranges of numbers, say n: [A1, B1], [A2, B2], ..., [An, Bn], where Ai ≤ Bi.
Create a sorted list of the starting and ending points, ordered numerically, indicating whether the point is a starting or ending point. Associate each of these with the range it comes from.
E.g.
[
[12, 25], #1
[14, 27], #2
[15, 22], #3
[17, 21], #4
[20, 65], #5
[62, 70], #6
[64, 80] #7
]
In this example, we'd get: 12+, 14+, 15+, 17+, 20+, 21-, 22-, 25-, 27-, 62+, 64+, 65-, 70-, 80-
Maintain a set of the range ids currently in scope.
Iterate through the list, adding the relevant range id to the set for each + and removing it for each -.
To continue the example:
val, set
init, {}
12, {1 [12, 25]}
14, {1 [12, 25], 2 [14, 27]} # found (1,2)
15, {1 [12, 25], 2 [14, 27], 3 [15, 22]} # found (1,3), (2,3)
17, {1 [12, 25], 2 [14, 27], 3 [15, 22], 4 [17, 21]} # found (1,4), (2,4), (3,4)
20, {1 [12, 25], 2 [14, 27], 3 [15, 22], 4 [17, 21], 5 [20,65]} # found (1,5), (2,5), (3,5), (4,5), (5,5)
21, {1 [12, 25], 2 [14, 27], 3 [15, 22], 5 [20,65]}
22, {1 [12, 25], 2 [14, 27], 5 [20,65]}
25, {2 [14, 27], 5 [20,65]}
27, {5 [20,65]}
62, {5 [20,65], 6 [62, 70]} # found (5,6)
64, {5 [20,65], 6 [62, 70], 7 [64, 80]} # found (5,7), (6,7)
65, {6 [62, 70], 7 [64, 80]}
70, {7 [64, 80]}
80, {}
As you go through this, each time you add to the set, output the appropriate tuples (which have the new set & one of the current sets at the time of adding the new set).
Here is Python3 code to execute this algorithm. I don't use Python regularly so this may not be canonical Python.
from datetime import datetime
time_ranges = [
(datetime(2023, 10, 26, 10, 0, 0), datetime(2023, 10, 26, 11, 0, 0)),
(datetime(2023, 10, 26, 10, 30, 0), datetime(2023, 10, 26, 12, 0, 0)),
(datetime(2023, 10, 26, 13, 0, 0), datetime(2023, 10, 26, 14, 0, 0)),
(datetime(2023, 10, 26, 13, 30, 0), datetime(2023, 10, 26, 15, 0, 0)),
(datetime(2023, 10, 26, 15, 30, 0), datetime(2023, 10, 26, 16, 0, 0)),
]
events = []
for idx, (start, end) in enumerate(time_ranges):
events.append((start, 'start', idx))
events.append((end, 'end', idx))
events.sort(key=lambda x: (x[0], x[1] == 'start'))
active_ranges = set()
overlapping_ranges = []
for time, event_type, idx in events:
if event_type == 'start':
for active_idx in active_ranges:
overlapping_ranges.append((active_idx, idx))
active_ranges.add(idx)
elif event_type == 'end':
active_ranges.remove(idx)
print(overlapping_ranges)