This is my dataframe:
df = pd.DataFrame({
'a': [0, 0, 1, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, 1, 0]
})
Expected output is creating column b
:
a b
0 0 0
1 0 0
2 1 0
3 -1 1
4 -1 -1
5 0 -1
6 0 -1
7 0 -1
8 0 0
9 0 0
10 -1 0
11 0 -1
12 0 -1
13 1 -1
14 0 1
Logic:
I explain the logic by some examples:
I want to create column b to df
I want to have a window of three rows
for example for row number 3 I want to look at three previous rows and capture the last non 0 value. if all of the values are 0 then 'b' is 0. in this case the last non zero value is 1. so column b is 1
for example for row number 4 . The last non zero value is -1 so column b is -1
I want to do the same for all rows.
This is what I have tried so far. I think there must be a better way.
import pandas as pd
df = pd.DataFrame({
'a': [0, 0, 1, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, 1, 0]
})
def last_nonzero(x):
# x is a pandas Series representing a window
nonzero = x[x != 0]
if not nonzero.empty:
# Return the last non-zero value in the window (i.e. the one closest to the current row)
return nonzero.iloc[-1]
return 0
# Shift by 1 so that the rolling window looks only at previous rows.
# Use a window size of 3 and min_periods=1 to allow early rows.
df['b'] = df['a'].shift(1).rolling(window=3, min_periods=1).apply(last_nonzero, raw=False).astype(int)
I don't think there is a much more straightforward approach. There is currently no rolling.last
method.
You could however simplify a bit your code:
def last_nonzero(s):
return 0 if (x:=s[s != 0]).empty else x.iloc[-1]
df['b'] = (df['a'].shift(1, fill_value=0)
.rolling(window=3, min_periods=1).apply(last_nonzero)
.convert_dtypes()
)
With a lambda:
df['b'] = (df['a'].shift(1, fill_value=0)
.rolling(window=3, min_periods=1)
.apply(lambda s: 0 if (x:=s[s != 0]).empty else x.iloc[-1])
.convert_dtypes()
)
Actually, if you have a range index, you could also use a merge_asof
on the indices:
window = 3
out = pd.merge_asof(
df,
df['a'].shift(1, fill_value=0).loc[lambda x: x != 0].rename('b'),
left_index=True,
right_index=True,
tolerance=window-1,
direction='backward',
).fillna({'b': 0})
Output:
a b
0 0 0
1 0 0
2 1 0
3 -1 1
4 -1 -1
5 0 -1
6 0 -1
7 0 -1
8 0 0
9 0 0
10 -1 0
11 0 -1
12 0 -1
13 1 -1
14 0 1