I recently did a coding challenge where I was tasked to return the number of unique interval pairs that do not overlap when given the starting points in one list and the ending points in one list. I was able to come up with an n^2 solution and eliminated duplicates by using a set to hash each entry tuple of (start, end). I was wondering if there was a more efficient way of approaching this, or this is the best I could do:
def paperCuttings(starting, ending):
# Pair each start with its corresponding end and sort
intervals = sorted(zip(starting, ending), key=lambda x: x[1])
non_overlaps = set()
print(intervals)
# Store valid combinations
for i in range(len(intervals)):
for j in range(i+1, len(intervals)):
# If the ending of the first is less than the starting of the second, they do not overlap
if intervals[i][1] < intervals[j][0]:
non_overlaps.add((intervals[i], intervals[j]))
return len(non_overlaps)
starting = [1,1,6,7]
ending = [5,3,8,10]
print(paperCuttings(starting, ending)) # should return 4
starting2 = [3,1,2,8,8]
ending2 = [5, 3, 7, 10, 10]
print(paperCuttings(starting2, ending2)) # should return 3
I ask because I timed out in some hidden test cases
This is a O(n*log n) solution in Ruby (n being the number of intervals). I will include a detailed explanation that should make conversion of the code to Python straightforward.
I assume that non-overlapping intervals have no points in common, not even endpoints1.
def paperCuttings(starting, ending)
# Compute an array of unique intervals sorted by the beginning
# of each interval
intervals = starting.zip(ending).uniq.sort
n = intervals.size
count = 0
# Loop over the indices of all but the last interval.
# The interval at index i of intervals will be referred to
# below as "interval i"
(0..n-2).each do |i|
# intervals[i] is interval i, an array containing its two
# endpoints. Extract the second endpoint to the variable endpoint
_, endpoint = intervals[i]
# Employ a binary search to find the index of the first
# interval j > i for which intervals[j].first > endpoint,
# where intervals[j].first is the beginning of interval j
k = (i+1..n-1).bsearch { |j| intervals[j].first > endpoint }
# k equals nil if no such interval is found, in which case
# continue the loop the next interval i
next if k == nil
# As intervals i and k are non-overlapping, interval i is
# non-overlapping with all intervals l, k <=l<= n-1, of which
# there are n-k, so add n-k to count
count = count + n - k
end
# return count
count
end
Try it.
starting = [1, 1, 6, 7]
ending = [5, 3, 8, 10]
paperCuttings(starting, ending)
#=> 4
starting = [3, 1, 2, 8, 8]
ending = [5, 3, 7, 10, 10]
paperCuttings(starting, ending)
#=> 3
Here I will explain the calculation
intervals = starting.zip(ending).uniq.sort
for
starting = [3, 1, 2, 8, 8]
ending = [5, 3, 7, 10, 10]
a = starting.zip(ending)
#=> [[3, 5], [1, 3], [2, 7], [8, 10], [8, 10]]
b = a.uniq
#=> [[3, 5], [1, 3], [2, 7], [8, 10]]
b.sort
#=> [[1, 3], [2, 7], [3, 5], [8, 10]]
The removal of duplicates is required by the statement of the problem.
The elements of b
are sorted by their first elements. Had there been two arrays with the same first element the second element would be used as the tie-breaker, though that's not important here.
The documentation for Ruby's binary search method (over a range) is here. Binary searches have a time complexity of O(log n), which accounts for the log term in the overall time complexity of O(n*log n).
1. If intervals that share only a single endpoint are regarded as non-overlapping, change starting[j] >= endpoint
to starting[j] > endpoint
.