pythonnumpy

How to get start indices of regions of empty intervals?


I have sorted start indices (included) and end indices (excluded) of intervals (obtained by using seachsorted), for instance:

import numpy as np

# Both arrays are of same size, and sorted.
# Size of arrays is number of intervals.
# Intervals do not overlap.

# interval indices:                0  1  2  3  4  5
interval_start_idxs    = np.array([0, 3, 3, 3, 6, 7])
interval_end_excl_idxs = np.array([2, 4, 4, 4, 7, 9])

An empty interval is identified when

interval_start_idxs[interval_idx] == interval_end_excl_idxs[interval_idx]-1

I would like to identify the starts and ends of each region where intervals are empty. A region is made with one or several intervals sharing the same start indices and end excluded indices.

With previous data, expected result would then be:

empty_interval_starts     = [1, 4] # start is included
empty_intervals_ends_excl = [4, 5] # end is excluded

This result is to be understood as:


Solution

  • import numpy as np
    
    interval_start_idxs    = np.array([0, 3, 3, 3, 6, 7])
    interval_end_excl_idxs = np.array([2, 4, 4, 4, 7, 9])
    
    is_region_start = np.r_[True, np.diff(interval_start_idxs) != 0]
    is_region_end = np.roll(is_region_start, -1)
    is_empty = (interval_start_idxs == interval_end_excl_idxs - 1)
    
    empty_interval_starts = np.nonzero(is_region_start & is_empty)[0]
    empty_interval_ends_excl = np.nonzero(is_region_end & is_empty)[0] + 1
    

    Explanation:

    At present, the results (empty_interval_starts and empty_interval_ends_excl) are Numpy arrays. If you prefer them as lists, as written in the question, you might want to convert them with empty_interval_starts.tolist() and empty_interval_ends_excl.tolist(), respectively.