[SOLVED] groupby count after conditional

groupby count after conditional - python

I'm trying to perform a groupby sum on a specific column in a pandas df. But I only want to execute of count after a certain threshold. For this example, it will be where B > 2.

The groupby is on A and the count is on C. The correct output should be:

x = 3

y = 9

df = pd.DataFrame(dict(A=list('ababaa'), B=[1, 1, 3, 4, 5, 6], C=[9, 9, 0, 9, 1, 2]))

df.loc[(df['B'] > 2), 'Count'] = df.groupby('A')['C'].transform('sum')

df['Count'] = df['Count'].replace(np.NaN, 0).astype(int)

Out:

   A  B  C  Count
0  x  1  9      0
1  y  1  9      0
2  x  3  0     12 *3
3  y  4  9     18 *9
4  x  5  1     12 *3
5  x  6  2     12 *3

Solution

Use mask in both sides:

m = df['B'] > 2
df['Count'] = 0
df.loc[m, 'Count'] = df[m].groupby('A')['C'].transform('sum')

print (df)
   A  B  C  Count
0  a  1  9      0
1  b  1  9      0
2  a  3  0      3
3  b  4  9      9
4  a  5  1      3
5  a  6  2      3

Another idea is use Series.where:

m = df['B'] > 2
df['Count'] = m.groupby(df['A']).transform('sum').where(m, 0)

Or numpy.where:

m = df['B'] > 2
df['Count'] = np.where(m, m.groupby(df['A']).transform('sum'), 0)

Or multiple by mask:

m = df['B'] > 2
df['Count'] = m.groupby(df['A']).transform('sum').mul(m)