I am trying to set all values that are <= 0, by group, to the maximum value in that group, but only after the last positive value. That is, all values <=0 in the group that come before the last positive value must be ignored. Example:
data = {'group':['A', 'A', 'A', 'A', 'A', 'B', 'B',
'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'],
'value':[3, 0, 8, 7, 0, -1, 0, 9, -2, 0, 0, 2, 0, 5, 0, 1]}
df = pd.DataFrame(data)
df
group value
0 A 3
1 A 0
2 A 8
3 A 7
4 A 0
5 B -1
6 B 0
7 B 9
8 B -2
9 B 0
10 B 0
11 C 2
12 C 0
13 C 5
14 C 0
15 C 1
and the result must be:
group value
0 A 3
1 A 0
2 A 8
3 A 7
4 A 8
5 B -1
6 B 0
7 B 9
8 B 9
9 B 9
10 B 9
11 C 2
12 C 0
13 C 5
14 C 0
15 C 1
Thanks to advise
Start by adding a column to identify the rows with negative value (more precisely <= 0):
df['neg'] = (df['value'] <= 0)
Then, for each group, find the sequence of last few entries that have 'neg'
set to True and that are contiguous. In order to do that, reverse the order of the DataFrame (with .iloc[::-1]
) and then use .cumprod()
on the 'neg'
column. cumprod()
will treat True as 1 and False as 0, so the cumulative product will be 1 as long as you're seeing all True's and will become and stay 0 as soon as you see the first False. Since we reversed the order, we're going backwards from the end, so we're finding the sequence of True's at the end.
df['upd'] = df.iloc[::-1].groupby('group')['neg'].cumprod().astype(bool)
Now that we know which entries to update, we just need to know what to update them to, which is the max of the group. We can use transform('max')
on a groupby to get that value and then all that's left is to do the actual update of 'value'
where 'upd'
is set:
df.loc[df['upd'], 'value'] = df.groupby('group')['value'].transform('max')
We can finish by dropping the two auxiliary columns we used in the process:
df = df.drop(['neg', 'upd'], axis=1)
The result I got matches your expected result.
UPDATE: Or do the whole operation in a single (long!) line, without adding any auxiliary columns to the original DataFrame:
df.loc[
df.assign(
neg=(df['value'] <= 0)
).iloc[::-1].groupby(
'group'
)['neg'].cumprod().astype(bool),
'value'
] = df.groupby(
'group'
)['value'].transform('max')