algorithmdatetime

Efficiently Finding Overlapping Time Ranges in Python


I have a list of time ranges represented as tuples of datetime objects:

time_ranges = [
    (datetime(2023, 10, 26, 10, 0, 0), datetime(2023, 10, 26, 11, 0, 0)),
    (datetime(2023, 10, 26, 10, 30, 0), datetime(2023, 10, 26, 12, 0, 0)),
    (datetime(2023, 10, 26, 13, 0, 0), datetime(2023, 10, 26, 14, 0, 0)),
    (datetime(2023, 10, 26, 13, 30, 0), datetime(2023, 10, 26, 15, 0, 0)),
    (datetime(2023, 10, 26, 15, 30, 0), datetime(2023, 10, 26, 16, 0, 0)),
]

I need to efficiently find all overlapping time ranges within this list. Two time ranges overlap if they share any common time. The output should be a list of tuples, where each tuple contains the indices of the overlapping ranges. For the example above, the desired output would be:

overlapping_ranges = [
    (0, 1),
    (2, 3),
]

I've tried a naive approach using nested loops to compare each range with every other range, but this has O(n^2) complexity and is too slow for my large dataset.

It would be helpful if the solution could also handle cases where a time range completely encompasses another time range (e.g., (10:00, 14:00) overlaps with (11:00, 13:00)).


Solution

    1. We are given a finite set of ranges of numbers, say n: [A1, B1], [A2, B2], ..., [An, Bn], where Ai ≤ Bi.

    2. Create a sorted list of the starting and ending points, ordered numerically, indicating whether the point is a starting or ending point. Associate each of these with the range it comes from.

    E.g.

    [
        [12, 25], #1
        [14, 27], #2
        [15, 22], #3
        [17, 21], #4
        [20, 65], #5
        [62, 70], #6
        [64, 80]  #7
    ]
    

    In this example, we'd get: 12+, 14+, 15+, 17+, 20+, 21-, 22-, 25-, 27-, 62+, 64+, 65-, 70-, 80-

    1. Maintain a set of the range ids currently in scope.

    2. Iterate through the list, adding the relevant range id to the set for each + and removing it for each -.

    To continue the example:
    val, set

    init, {}
    12, {1 [12, 25]}
    14, {1 [12, 25], 2 [14, 27]} # found (1,2)
    15, {1 [12, 25], 2 [14, 27], 3 [15, 22]} # found (1,3), (2,3)
    17, {1 [12, 25], 2 [14, 27], 3 [15, 22], 4 [17, 21]} # found (1,4), (2,4), (3,4)
    20, {1 [12, 25], 2 [14, 27], 3 [15, 22], 4 [17, 21], 5 [20,65]} # found (1,5), (2,5), (3,5), (4,5), (5,5)
    21, {1 [12, 25], 2 [14, 27], 3 [15, 22], 5 [20,65]}
    22, {1 [12, 25], 2 [14, 27], 5 [20,65]}
    25, {2 [14, 27], 5 [20,65]}
    27, {5 [20,65]}
    62, {5 [20,65], 6 [62, 70]} # found (5,6)
    64, {5 [20,65], 6 [62, 70], 7 [64, 80]} # found (5,7), (6,7)
    65, {6 [62, 70], 7 [64, 80]}
    70, {7 [64, 80]}
    80, {}

    As you go through this, each time you add to the set, output the appropriate tuples (which have the new set & one of the current sets at the time of adding the new set).

    Here is Python3 code to execute this algorithm. I don't use Python regularly so this may not be canonical Python.

    from datetime import datetime
    
        time_ranges = [
            (datetime(2023, 10, 26, 10, 0, 0), datetime(2023, 10, 26, 11, 0, 0)),
            (datetime(2023, 10, 26, 10, 30, 0), datetime(2023, 10, 26, 12, 0, 0)),
            (datetime(2023, 10, 26, 13, 0, 0), datetime(2023, 10, 26, 14, 0, 0)),
            (datetime(2023, 10, 26, 13, 30, 0), datetime(2023, 10, 26, 15, 0, 0)),
            (datetime(2023, 10, 26, 15, 30, 0), datetime(2023, 10, 26, 16, 0, 0)),
        ]
    
        events = []
        for idx, (start, end) in enumerate(time_ranges):
            events.append((start, 'start', idx))
            events.append((end, 'end', idx))
    
        events.sort(key=lambda x: (x[0], x[1] == 'start'))
    
        active_ranges = set()
        overlapping_ranges = []
    
        for time, event_type, idx in events:
            if event_type == 'start':
                for active_idx in active_ranges:
                    overlapping_ranges.append((active_idx, idx))
                active_ranges.add(idx)
            elif event_type == 'end':
                active_ranges.remove(idx)
    
        print(overlapping_ranges)