[SOLVED] Pandas - Group by with same dimension

Pandas - Group by with same dimension

I have a dataframe that has five colums: G1, G2, G3, G4 and Tax. I'm grouping by G and Tax for each G column to count how many rows has the value 1. But the dimension of result is different for each G because some G's only have 0 values.

This is my code:

self.instance.df.groupby(["G1", "Tax"]).size().drop(0)
self.instance.df.groupby(["G2", "Tax"]).size().drop(0)

I'm dropping the index 0 because i only need 1 values.

INITIAL DATAFRAME:

       G1  G2  G3  G4  Tax
   0   0   1   0   0   1.1
   1   1   1   0   1   1.1
   2   1   1   0   0   1.2
   3   0   1   0   1   1.3

The result i'm expecting is:

G1  Tax
1   1.1   5
    1.2   7
    1.5   3
    1.7   0

and the result i'm getting:

Solution

IIUC, you can greatly simplify your logic by performing a single groupby:

out = (df.set_index('Tax').filter(like='G').eq(1)
         .groupby(level='Tax').sum()
       )

Or:

cols = ['G1', 'G2', 'G3', 'G4']

out = df.groupby('Tax')[cols].agg(lambda x: x.eq(1).sum())

And if your values are only 0/1:

cols = ['G1', 'G2', 'G3', 'G4']

out = df.groupby('Tax')[cols].sum()

Example:

# input
   G1  G2  G3  G4 Tax
0   0   1   0   0   A
1   1   1   0   1   A
2   1   1   0   0   A
3   0   1   0   1   B

# output
     G1  G2  G3  G4
Tax                
A     2   3   0   1
B     0   1   0   1