pythonpandasdataframenumpy

Pandas group consecutive and label the length


I want get consecutive length labeled data


a 
---
1   
0 
1 
0 
1 
1 
1 
0 
1
1

I want :

a | c 
--------
1   1
0   0
1   2
1   2
0   0
1   3
1   3
1   3
0   0
1   2
1   2

then I can calculate the mean of "b" column by group "c". tried with shift and cumsum and cumcount all not work.


Solution

  • Use GroupBy.transform by consecutive groups and then set 0 if not 1 in a column:

    df['c1'] = (df.groupby(df.a.ne(df.a.shift()).cumsum())['a']
                  .transform('size')
                  .where(df.a.eq(1), 0))
    print (df)
        a  b  c  c1
    0   1  1  1   1
    1   0  2  0   0
    2   1  3  2   2
    3   1  2  2   2
    4   0  1  0   0
    5   1  3  3   3
    6   1  1  3   3
    7   1  3  3   3
    8   0  2  0   0
    9   1  2  2   2
    10  1  1  2   2
    

    If there are only 0, 1 values is possible multiple by a:

    df['c1'] = (df.groupby(df.a.ne(df.a.shift()).cumsum())['a']
                  .transform('size')
                  .mul(df.a))
    print (df)
        a  b  c  c1
    0   1  1  1   1
    1   0  2  0   0
    2   1  3  2   2
    3   1  2  2   2
    4   0  1  0   0
    5   1  3  3   3
    6   1  1  3   3
    7   1  3  3   3
    8   0  2  0   0
    9   1  2  2   2
    10  1  1  2   2