I have a dataframe that has five colums: G1, G2, G3, G4 and Tax. I'm grouping by G and Tax for each G column to count how many rows has the value 1. But the dimension of result is different for each G because some G's only have 0 values.
This is my code:
self.instance.df.groupby(["G1", "Tax"]).size().drop(0)
self.instance.df.groupby(["G2", "Tax"]).size().drop(0)
I'm dropping the index 0 because i only need 1 values.
INITIAL DATAFRAME:
G1 G2 G3 G4 Tax
0 0 1 0 0 1.1
1 1 1 0 1 1.1
2 1 1 0 0 1.2
3 0 1 0 1 1.3
The result i'm expecting is:
G1 Tax
1 1.1 5
1.2 7
1.5 3
1.7 0
and the result i'm getting:
G1 Tax
1 1.1 5
1.2 7
1.5 3
IIUC, you can greatly simplify your logic by performing a single groupby
:
out = (df.set_index('Tax').filter(like='G').eq(1)
.groupby(level='Tax').sum()
)
Or:
cols = ['G1', 'G2', 'G3', 'G4']
out = df.groupby('Tax')[cols].agg(lambda x: x.eq(1).sum())
And if your values are only 0/1:
cols = ['G1', 'G2', 'G3', 'G4']
out = df.groupby('Tax')[cols].sum()
Example:
# input
G1 G2 G3 G4 Tax
0 0 1 0 0 A
1 1 1 0 1 A
2 1 1 0 0 A
3 0 1 0 1 B
# output
G1 G2 G3 G4
Tax
A 2 3 0 1
B 0 1 0 1