pythonarraysnumpy

Find Consecutive Repeats of Specific Length in NumPy


Say that I have a NumPy array:

a = np.array([0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15])

And I have a length m = 2 that the user specifies in order to see if there are any repeats of that length within the time series. In this case, the repeats of length m = 2 are:

[2, 2]
[5, 5]
[9, 9]
[9, 9]
[13, 13]

And the user can change this to m = 3 and the repeats of length m = 3 are:

[9, 9, 9]
[13, 13, 13]

I need a function that either returns the index of where a repeat is found or None. So, for m = 3 the function would return the following NumPy array of starting indices:

[11, 17]

And for m = 4 the function would return None. What's the cleanest and fastest way to accomplish this?

Update Note that the array does not have to be sorted and we are not interested in the result after a sort. We only want the result from the unsorted array. Your result for m = 2 should be the same for this array:

b = np.array([0, 11, 2, 2, 3, 40, 5, 5, 16, 7, 80, 9, 9, 9, 1, 11, 12, 13, 13, 13, 4, 5])

Solution

  • Approach #1

    We could leverage 1D convolution for a vectorized solution -

    def consec_repeat_starts(a, n):
        N = n-1
        m = a[:-1]==a[1:]
        return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1
    

    Sample runs -

    In [286]: a
    Out[286]: 
    array([ 0,  1,  2,  2,  3,  4,  5,  5,  6,  7,  8,  9,  9,  9, 10, 11, 12,
           13, 13, 13, 14, 15])
    
    In [287]: consec_repeat_starts(a, 2)
    Out[287]: array([ 2,  6, 11, 12, 17, 18])
    
    In [288]: consec_repeat_starts(a, 3)
    Out[288]: array([11, 17])
    
    In [289]: consec_repeat_starts(a, 4)
    Out[289]: array([], dtype=int64)
    

    Approach #2

    We could also make use of binary-erosion -

    from scipy.ndimage.morphology import binary_erosion
    
    def consec_repeat_starts_v2(a, n):
        N = n-1
        m = a[:-1]==a[1:]
        return np.flatnonzero(binary_erosion(m,[1]*N))-(N//2)