I have a dataframe with months and sales. If somebody does 250 sales for 3 consecutive months, they get a bonus.
Month | sales | 3 month consistency |
---|---|---|
Month 1 | 250 | |
Month 2 | 250 | |
Month 3 | 250 | 1 |
Month 4 | 250 | |
Month 5 | 250 | |
Month 6 | 250 | 1 |
Month 7 | 0 | |
Month 8 | 250 | |
Month 9 | 250 | |
Month 10 | 250 | 1 |
Month 11 | 250 | |
Month 12 | 250 | |
Month 13 | 0 | |
Month 14 | 250 | |
Month 15 | 250 | |
Month 16 | 250 | 1 |
I want to populate the 3 month consistency column
Tried using cum count with conditions but it did not work and I could not find such an example anywhere. What I tried was to use rolling window to identify every 3rd consecutive record with 250. But then it does not reset after the each set of 3 consecutive 250s.
df.groupby('id').apply(lambda x: x['gt_250'].rolling(min_periods=1, window=3).sum())
#-- grouped by id because this has to be computed for each salesperson_id
The result I got from it was,
Month | sales | 3 month consistency |
---|---|---|
Month 1 | 250 | |
Month 2 | 250 | |
Month 3 | 250 | 3 |
Month 4 | 250 | 3 |
Month 5 | 250 | 3 |
You could use a custom groupby.cumcount
and mod
:
N = 3 # number of consecutive months
# identify target months
m = df['sales'].ge(250)
# for groups of consecutive target months
group = (~m).cumsum()
# count the Nth repeat for each group
df['3 month consistency'] = (df[m].groupby(group).cumcount().mod(N).eq(N-1)
.map({True: 1})
)
NB. if you want a boolean column replace .map({True: 1})
by .reindex(df.index).eq(True)
.
Output:
Month sales 3 month consistency
0 Month 1 250 NaN
1 Month 2 250 NaN
2 Month 3 250 1.0
3 Month 4 250 NaN
4 Month 5 250 NaN
5 Month 6 250 1.0
6 Month 7 0 NaN
7 Month 8 250 NaN
8 Month 9 250 NaN
9 Month 10 250 1.0
10 Month 11 250 NaN
11 Month 12 250 NaN
12 Month 13 0 NaN
13 Month 14 250 NaN
14 Month 15 250 NaN
15 Month 16 250 1.0
Intermediates:
Month sales m group cumcount mod 3 month consistency
0 Month 1 250 True 0 0.0 0.0 NaN
1 Month 2 250 True 0 1.0 1.0 NaN
2 Month 3 250 True 0 2.0 2.0 1.0
3 Month 4 250 True 0 3.0 0.0 NaN
4 Month 5 250 True 0 4.0 1.0 NaN
5 Month 6 250 True 0 5.0 2.0 1.0
6 Month 7 0 False 1 NaN NaN NaN
7 Month 8 250 True 1 0.0 0.0 NaN
8 Month 9 250 True 1 1.0 1.0 NaN
9 Month 10 250 True 1 2.0 2.0 1.0
10 Month 11 250 True 1 3.0 0.0 NaN
11 Month 12 250 True 1 4.0 1.0 NaN
12 Month 13 0 False 2 NaN NaN NaN
13 Month 14 250 True 2 0.0 0.0 NaN
14 Month 15 250 True 2 1.0 1.0 NaN
15 Month 16 250 True 2 2.0 2.0 1.0
Variant (with a boolean output):
N = 3
m = df['sales'].ge(250)
group = m.ne(m.shift()).cumsum()
df['3 month consistency'] = df.groupby(group).cumcount().mod(N).eq(N-1)
Intermediates:
Month sales m group cumcount mod 3 month consistency
0 Month 1 250 True 1 0 0 False
1 Month 2 250 True 1 1 1 False
2 Month 3 250 True 1 2 2 True
3 Month 4 250 True 1 3 0 False
4 Month 5 250 True 1 4 1 False
5 Month 6 250 True 1 5 2 True
6 Month 7 0 False 2 0 0 False
7 Month 8 250 True 3 0 0 False
8 Month 9 250 True 3 1 1 False
9 Month 10 250 True 3 2 2 True
10 Month 11 250 True 3 3 0 False
11 Month 12 250 True 3 4 1 False
12 Month 13 0 False 4 0 0 False
13 Month 14 250 True 5 0 0 False
14 Month 15 250 True 5 1 1 False
15 Month 16 250 True 5 2 2 True
N = 3
m = df['sales'].ge(250)
group = m.ne(m.shift()).cumsum()
df['3 month consistency'] = df.groupby(['ID', group]).cumcount().mod(N).eq(N-1)
Example:
ID Month sales 3 month consistency
0 A Month 1 250 False
1 A Month 2 250 False
2 A Month 3 250 True
3 A Month 4 250 False
4 A Month 5 250 False
5 A Month 6 250 True
6 A Month 7 0 False
7 A Month 8 250 False
8 A Month 9 250 False
9 A Month 10 250 True
10 A Month 11 250 False
11 A Month 12 250 False
12 A Month 13 0 False
13 A Month 14 250 False
14 A Month 15 250 False
15 B Month 1 250 False # independent from A
16 B Month 2 250 False
17 B Month 3 250 True
18 B Month 4 250 False
19 B Month 5 250 False
20 B Month 6 250 True
21 B Month 7 0 False
22 B Month 8 250 False
23 B Month 9 250 False
24 B Month 10 250 True
25 B Month 11 250 False
26 B Month 12 250 False
27 B Month 13 0 False
28 B Month 14 250 False
29 B Month 15 250 False