pythonpandas

What is the best way to get the last non zero value in a window of N rows?


This is my dataframe:

df = pd.DataFrame({
    'a': [0, 0, 1, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, 1, 0]
})

Expected output is creating column b:

    a  b
0   0  0
1   0  0
2   1  0
3  -1  1
4  -1 -1
5   0 -1
6   0 -1
7   0 -1
8   0  0
9   0  0
10 -1  0
11  0 -1
12  0 -1
13  1 -1
14  0  1

Logic:

I explain the logic by some examples:

I want to create column b to df

I want to have a window of three rows

for example for row number 3 I want to look at three previous rows and capture the last non 0 value. if all of the values are 0 then 'b' is 0. in this case the last non zero value is 1. so column b is 1

for example for row number 4 . The last non zero value is -1 so column b is -1

I want to do the same for all rows.

This is what I have tried so far. I think there must be a better way.

import pandas as pd
df = pd.DataFrame({
    'a': [0, 0, 1, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, 1, 0]
})

def last_nonzero(x):
    # x is a pandas Series representing a window
    nonzero = x[x != 0]
    if not nonzero.empty:
        # Return the last non-zero value in the window (i.e. the one closest to the current row)
        return nonzero.iloc[-1]
    return 0

# Shift by 1 so that the rolling window looks only at previous rows.
# Use a window size of 3 and min_periods=1 to allow early rows.
df['b'] = df['a'].shift(1).rolling(window=3, min_periods=1).apply(last_nonzero, raw=False).astype(int)

Solution

  • I don't think there is a much more straightforward approach. There is currently no rolling.last method.

    You could however simplify a bit your code:

    def last_nonzero(s):
        return 0 if (x:=s[s != 0]).empty else x.iloc[-1]
    
    df['b'] = (df['a'].shift(1, fill_value=0)
               .rolling(window=3, min_periods=1).apply(last_nonzero)
               .convert_dtypes()
              )
    

    With a lambda:

    df['b'] = (df['a'].shift(1, fill_value=0)
               .rolling(window=3, min_periods=1)
               .apply(lambda s: 0 if (x:=s[s != 0]).empty else x.iloc[-1])
               .convert_dtypes()
              )
    

    Actually, if you have a range index, you could also use a merge_asof on the indices:

    window = 3
    out = pd.merge_asof(
        df,
        df['a'].shift(1, fill_value=0).loc[lambda x: x != 0].rename('b'),
        left_index=True,
        right_index=True,
        tolerance=window-1,
        direction='backward',
    ).fillna({'b': 0})
    

    Output:

        a   b
    0   0   0
    1   0   0
    2   1   0
    3  -1   1
    4  -1  -1
    5   0  -1
    6   0  -1
    7   0  -1
    8   0   0
    9   0   0
    10 -1   0
    11  0  -1
    12  0  -1
    13  1  -1
    14  0   1