pythonalgorithmcombinationsintervals

Combination of Non-overlapping interval PAIRS


I recently did a coding challenge where I was tasked to return the number of unique interval pairs that do not overlap when given the starting points in one list and the ending points in one list. I was able to come up with an n^2 solution and eliminated duplicates by using a set to hash each entry tuple of (start, end). I was wondering if there was a more efficient way of approaching this, or this is the best I could do:

def paperCuttings(starting, ending):
    # Pair each start with its corresponding end and sort
    intervals = sorted(zip(starting, ending), key=lambda x: x[1])
    non_overlaps = set()

    print(intervals)

    # Store valid combinations
    for i in range(len(intervals)):
        for j in range(i+1, len(intervals)):
            # If the ending of the first is less than the starting of the second, they do not overlap
            if intervals[i][1] < intervals[j][0]:                
                non_overlaps.add((intervals[i], intervals[j]))

    return len(non_overlaps)

starting = [1,1,6,7]
ending = [5,3,8,10]
print(paperCuttings(starting, ending)) # should return 4

starting2 = [3,1,2,8,8]
ending2 = [5, 3, 7, 10, 10]
print(paperCuttings(starting2, ending2))  # should return 3

I ask because I timed out in some hidden test cases


Solution

  • This is a O(n*log n) solution in Ruby (n being the number of intervals). I will include a detailed explanation that should make conversion of the code to Python straightforward.

    I assume that non-overlapping intervals have no points in common, not even endpoints1.

    def paperCuttings(starting, ending)
      # Compute an array of unique intervals sorted by the beginning
      # of each interval
      intervals = starting.zip(ending).uniq.sort
    
      n = intervals.size
      count = 0
    
      # Loop over the indices of all but the last interval.
      # The interval at index i of intervals will be referred to
      # below as "interval i" 
      (0..n-2).each do |i|
        # intervals[i] is interval i, an array containing its two
        # endpoints. Extract the second endpoint to the variable endpoint
        _, endpoint = intervals[i]
         
        # Employ a binary search to find the index of the first
        # interval j > i for which intervals[j].first > endpoint,
        # where intervals[j].first is the beginning of interval j
        k = (i+1..n-1).bsearch { |j| intervals[j].first > endpoint }
    
        # k equals nil if no such interval is found, in which case
        # continue the loop the next interval i
        next if k == nil
    
        # As intervals i and k are non-overlapping, interval i is
        # non-overlapping with all intervals l, k <=l<= n-1, of which
        # there are n-k, so add n-k to count
        count = count + n - k
      end
      # return count
      count
    end
    

    Try it.

    starting = [1, 1, 6,  7]
    ending   = [5, 3, 8, 10]
    
    paperCuttings(starting, ending)
      #=> 4
    
    starting = [3, 1, 2,  8,  8]
    ending   = [5, 3, 7, 10, 10]
    paperCuttings(starting, ending)
      #=> 3
    

    Here I will explain the calculation

    intervals = starting.zip(ending).uniq.sort
    

    for

    starting = [3, 1, 2,  8,  8]
    ending   = [5, 3, 7, 10, 10]
    
    a = starting.zip(ending)
      #=> [[3, 5], [1, 3], [2, 7], [8, 10], [8, 10]]
    b = a.uniq
      #=> [[3, 5], [1, 3], [2, 7], [8, 10]]
    b.sort
      #=> [[1, 3], [2, 7], [3, 5], [8, 10]]
    

    The removal of duplicates is required by the statement of the problem.

    The elements of b are sorted by their first elements. Had there been two arrays with the same first element the second element would be used as the tie-breaker, though that's not important here.

    The documentation for Ruby's binary search method (over a range) is here. Binary searches have a time complexity of O(log n), which accounts for the log term in the overall time complexity of O(n*log n).

    1. If intervals that share only a single endpoint are regarded as non-overlapping, change starting[j] >= endpoint to starting[j] > endpoint.