pythonpandas

groupby count after conditional - python


I'm trying to perform a groupby sum on a specific column in a pandas df. But I only want to execute of count after a certain threshold. For this example, it will be where B > 2.

The groupby is on A and the count is on C. The correct output should be:

x = 3

y = 9

df = pd.DataFrame(dict(A=list('ababaa'), B=[1, 1, 3, 4, 5, 6], C=[9, 9, 0, 9, 1, 2]))

df.loc[(df['B'] > 2), 'Count'] = df.groupby('A')['C'].transform('sum')

df['Count'] = df['Count'].replace(np.NaN, 0).astype(int)

Out:

   A  B  C  Count
0  x  1  9      0
1  y  1  9      0
2  x  3  0     12 *3
3  y  4  9     18 *9
4  x  5  1     12 *3
5  x  6  2     12 *3

Solution

  • Use mask in both sides:

    m = df['B'] > 2
    df['Count'] = 0
    df.loc[m, 'Count'] = df[m].groupby('A')['C'].transform('sum')
    
    print (df)
       A  B  C  Count
    0  a  1  9      0
    1  b  1  9      0
    2  a  3  0      3
    3  b  4  9      9
    4  a  5  1      3
    5  a  6  2      3
    

    Another idea is use Series.where:

    m = df['B'] > 2
    df['Count'] = m.groupby(df['A']).transform('sum').where(m, 0)
    

    Or numpy.where:

    m = df['B'] > 2
    df['Count'] = np.where(m, m.groupby(df['A']).transform('sum'), 0)
    

    Or multiple by mask:

    m = df['B'] > 2
    df['Count'] = m.groupby(df['A']).transform('sum').mul(m)