pythonpython-3.xpandaswindowing

Apply expanding window on subsections of dataframe


I have a dataframe like this

  key1  day  feat
0    a    1  None
1    a    2     A
2    a    3  None
3    a    4     A
4    b    1     A
5    b    2  None
6    b    3  None
7    b    4     A

I would like the apply an expanding window with the count function over the feat column but apply the expanding window by sub category based on the key1 column.

eg I want my resultant df to be

  key1  day  feat count
0    a    1  None   0
1    a    2     A   1
2    a    3  None   1
3    a    4     A   2
4    b    1     A   1
5    b    2  None   1
6    b    3  None   1
7    b    4     A   2

So in this case I would be grouping by key1 and then apply the expanding window to the sub groups so that the count resets for each group. Note that in my actual problem, there are two keys I need to group by not just one.


Solution

  • Lets do groupby + cumsum

    df['feat'] = df['feat'].replace('None', np.nan) # use only if None is a string
    df['count'] = df['feat'].notna().groupby(df['key1']).cumsum()
    

    Result

      key1  day feat  count
    0    a    1  NaN      0
    1    a    2    A      1
    2    a    3  NaN      1
    3    a    4    A      2
    4    b    1    A      1
    5    b    2  NaN      1
    6    b    3  NaN      1
    7    b    4    A      2