I have sorted start indices (included) and end indices (excluded) of intervals (obtained by using seachsorted
), for instance:
import numpy as np
# Both arrays are of same size, and sorted.
# Size of arrays is number of intervals.
# Intervals do not overlap.
# interval indices: 0 1 2 3 4 5
interval_start_idxs = np.array([0, 3, 3, 3, 6, 7])
interval_end_excl_idxs = np.array([2, 4, 4, 4, 7, 9])
An empty interval is identified when
interval_start_idxs[interval_idx] == interval_end_excl_idxs[interval_idx]-1
I would like to identify the starts and ends of each region where intervals are empty. A region is made with one or several intervals sharing the same start indices and end excluded indices.
With previous data, expected result would then be:
empty_interval_starts = [1, 4] # start is included
empty_intervals_ends_excl = [4, 5] # end is excluded
This result is to be understood as:
import numpy as np
interval_start_idxs = np.array([0, 3, 3, 3, 6, 7])
interval_end_excl_idxs = np.array([2, 4, 4, 4, 7, 9])
is_region_start = np.r_[True, np.diff(interval_start_idxs) != 0]
is_region_end = np.roll(is_region_start, -1)
is_empty = (interval_start_idxs == interval_end_excl_idxs - 1)
empty_interval_starts = np.nonzero(is_region_start & is_empty)[0]
empty_interval_ends_excl = np.nonzero(is_region_end & is_empty)[0] + 1
Explanation:
is_region_start
marks the starts of all potential regions, i.e. indices where the current index differs from its predecessoris_region_start
by one to get is_region_end
; the rollover in the roll-back from index 0 to index -1 works in our favor here: the marker, previously at index 0, which is always True
, used to mark the start of the first potential region in is_region_start
and now marks the end of the last potential region in is_region_end
is_empty
marks all indices that are actually empty, according to your definitionempty_interval_starts
is the combination of two criteria: start of a potential region and actually being empty (since np.nonzero()
returns tuples, we need to extract the first element, …[0]
, to get to the actual array of indices)empty_interval_ends_excl
, likewise, is the combination of two criteria: end of a potential region and actually being empty; however, since empty_interval_ends_excl
should be exclusive, we need to add 1 to get the final resultAt present, the results (empty_interval_starts
and empty_interval_ends_excl
) are Numpy arrays. If you prefer them as lists, as written in the question, you might want to convert them with empty_interval_starts.tolist()
and empty_interval_ends_excl.tolist()
, respectively.