pythonpandasgroup-by

Min and Max value on multiple cells group by third column value


I would like to extract the MIN and MAX from multiple columns (start_1, end_1, start_2, end_2) group by "Name"

I have data that looks like this:

start_1  end_1  start_2  end_2  name
 100      200    300      400    ABC
 100      200    300      400    ABC
 150      250    300      400    ABC
 300      200    300      900    DEF
 50       200    300      1000   DEF

The output should be like this:

start  end  name
 100   400  ABC
 50    1000 DEF

Looked into following answers already: Group by pandas Column

using-pandas-want-to-group-by-multiple-columns-for-min-max-and-add-another-colu

Looking forward for your asistance


Solution

  • Another possible solution:

    d = df.set_index('name')
    (pd.concat([
        d.filter(like='start').groupby(level=0).min().min(axis=1).rename('start'),
        d.filter(like='end').groupby(level=0).max().max(axis=1).rename('end')], 
               axis=1, names=['start', 'end'])
     .reset_index())
    

    This first sets the name column as the index (set_index), then uses concat to combine two computed dataframes: one for the minimum start values and another for the maximum end values. The filter method isolates columns containing start or end, and groupby (with level=0) groups by the index (name). The inner min() or max() aggregates within groups, while the outer min(axis=1) or max(axis=1) computes the row-wise minimum/maximum across columns. Finally, reset_index restores name as a column.

    We can avoid executing groupby twice in the following way:

    d = df.set_index('name')
    col_funcs = {'start': 'min', 'end': 'max'}
    (pd.concat(
        [d.filter(like=k).agg(v, axis=1).rename(k) for k, v in col_funcs.items()],
        axis=1)
     .groupby(level=0).agg(col_funcs)
     .reset_index())
    

    Output:

      name  start   end
    0  ABC    100   400
    1  DEF     50  1000