pythonpandasgroup-by

Min and Max value on multiple cells group by third column value


I would like to extract the MIN and MAX from multiple columns (start_1, end_1, start_2, end_2) group by "Name"

I have data that looks like this:

start_1  end_1  start_2  end_2  name
 100      200    300      400    ABC
 100      200    300      400    ABC
 150      250    300      400    ABC
 300      200    300      900    DEF
 50       200    300      1000   DEF

The output should be like this:

start  end  name
 100   400  ABC
 50    1000 DEF

Looked into following answers already: Group by pandas Column

using-pandas-want-to-group-by-multiple-columns-for-min-max-and-add-another-colu

Looking forward for your asistance


Solution

  • Use wide_to_long for reshape and then aggregate min and max:

    out = (pd.wide_to_long(df.reset_index(),
                           stubnames=['start','end'],
                           i=['index', 'name'],
                           j=' ',
                           sep="_")
             .groupby('name')
             .agg(start=('start', 'min'), end=('end', 'max'))
             .reset_index())
    print (out)
      name  start   end
    0  ABC    100   400
    1  DEF     50  1000
    

    How it working:

    print(pd.wide_to_long(df.reset_index(),
                          stubnames=['start','end'],
                          i=['index', 'name'],
                          j=' ',
                          sep="_"))
    
                  start   end
    index name               
    0     ABC  1    100   200
               2    300   400
    1     ABC  1    100   200
               2    300   400
    2     ABC  1    150   250
               2    300   400
    3     DEF  1    300   200
               2    300   900
    4     DEF  1     50   200
               2    300  1000
    

    Another idea is create minimal and maximal columns first and then aggregate:

    out = (df.assign(start = df.filter(like='start').min(axis=1),
                     end = df.filter(like='end').max(axis=1))
            .groupby('name')
            .agg(start=('start', 'min'), end=('end', 'max'))
            .reset_index())
    
    print (out)
      name  start   end
    0  ABC    100   400
    1  DEF     50  1000
    

    How it working:

    print(df.assign(start = df.filter(like='start').min(axis=1),
                    end = df.filter(like='end').max(axis=1)))
    
       start_1  end_1  start_2  end_2 name  start   end
    0      100    200      300    400  ABC    100   400
    1      100    200      300    400  ABC    100   400
    2      150    250      300    400  ABC    150   400
    3      300    200      300    900  DEF    300   900
    4       50    200      300   1000  DEF     50  1000