pythonarraysnumpy

How to get only the first occurrence of each increasing value in numpy array?


While working on first-passage probabilities, I encountered this problem. I want to find a NumPythonic way (without explicit loops) to leave only the first occurrence of strictly increasing values in each row of a numpy array, while replacing repeated or non-increasing values with zeros. For instance, if

arr = np.array([
    [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5],
    [1, 1, 2, 2, 2, 3, 2, 2, 3, 3, 3, 4, 4],
    [3, 2, 1, 2, 1, 1, 2, 3, 4, 5, 4, 3, 2]])

I would like to get as output:

out = np.array([
    [1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 5, 0],
    [1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0],
    [3, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0]])

Solution

  • Maximum can be accumulated per-row:

    >>> arr
    array([[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5],
           [1, 1, 2, 2, 2, 3, 2, 2, 3, 3, 3, 4, 4],
           [3, 2, 1, 2, 1, 1, 2, 3, 4, 5, 4, 3, 2]])
    >>> np.maximum.accumulate(arr, axis=1)
    array([[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5],
           [1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4],
           [3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5]])
    

    Then you can easily mask out non-increasing values:

    >>> m_arr = np.maximum.accumulate(arr, axis=1)
    >>> np.where(np.diff(m_arr, axis=1, prepend=0), arr, 0)
    array([[1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 5, 0],
           [1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0],
           [3, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0]])