I am trying to create a cum count column that counts the instances of a boolean variable, but resets to zero when it encounters a zero in the boolean variable. I am looking for a conditional cumcount.
I have posted my code (it works) of an initial stab that I took at this issue. However, I am looking to see if anyone has something more efficient as the loop used in my solution my be slow in large dfs
def counter(series_converted_to_a_list):
counter = 0
counter_list = []
# loop through the list and count
for i in range(len(series_converted_to_a_list)):
if series_converted_to_a_list[i] > 0:
counter += 1
else:
counter = 0
counter_list.append(counter)
return counter_list
# Sample dataframe
df = pd.DataFrame({'bool': [1,0,1,1,1,1,1,0,1,1,1,0,0,1,1,1]})
# convert the boolean column to a list
bool_series_converted_to_a_list = list(df['bool'])
# use the function
counter_list = counter(bool_series_converted_to_a_list)
# convert the list into a column in the sample dataframe
df['counter_list'] = counter_list
df
You can use groupby.cumcount
after masking the 0s and setting a custom grouper on all 0-starting groups, and finally restoring the 0s with reindex
:
df = pd.DataFrame({'bool': [1,0,1,1,1,1,1,0,1,1,1,0,0,1,1,1]})
m = df['bool'].eq(1)
df['cumcount'] = (m[m].groupby((~m).cumsum()).cumcount().add(1)
.reindex(df.index, fill_value=0)
)
Or, maybe simpler, using a cumsum
which will ensure that 1 always starts on the first 1 per group:
m = df['bool'].eq(1)
df['cumcount'] = m.groupby((~m).cumsum()).cumsum()
Output:
bool cumcount
0 1 1
1 0 0
2 1 1
3 1 2
4 1 3
5 1 4
6 1 5
7 0 0
8 1 1
9 1 2
10 1 3
11 0 0
12 0 0
13 1 1
14 1 2
15 1 3