pythonpandasdataframegroup-byseries

Pandas - Group by with same dimension


I have a dataframe that has five colums: G1, G2, G3, G4 and Tax. I'm grouping by G and Tax for each G column to count how many rows has the value 1. But the dimension of result is different for each G because some G's only have 0 values.

This is my code:

self.instance.df.groupby(["G1", "Tax"]).size().drop(0)
self.instance.df.groupby(["G2", "Tax"]).size().drop(0)

I'm dropping the index 0 because i only need 1 values.

INITIAL DATAFRAME:

       G1  G2  G3  G4  Tax
   0   0   1   0   0   1.1
   1   1   1   0   1   1.1
   2   1   1   0   0   1.2
   3   0   1   0   1   1.3

The result i'm expecting is:

G1  Tax
1   1.1   5
    1.2   7
    1.5   3
    1.7   0

and the result i'm getting:

 G1 Tax
 1  1.1   5
    1.2   7
    1.5   3

Solution

  • IIUC, you can greatly simplify your logic by performing a single groupby:

    out = (df.set_index('Tax').filter(like='G').eq(1)
             .groupby(level='Tax').sum()
           )
    

    Or:

    cols = ['G1', 'G2', 'G3', 'G4']
    
    out = df.groupby('Tax')[cols].agg(lambda x: x.eq(1).sum())
    

    And if your values are only 0/1:

    cols = ['G1', 'G2', 'G3', 'G4']
    
    out = df.groupby('Tax')[cols].sum()
    

    Example:

    # input
       G1  G2  G3  G4 Tax
    0   0   1   0   0   A
    1   1   1   0   1   A
    2   1   1   0   0   A
    3   0   1   0   1   B
    
    # output
         G1  G2  G3  G4
    Tax                
    A     2   3   0   1
    B     0   1   0   1