I would like to extract the MIN and MAX from multiple columns (start_1, end_1, start_2, end_2) group by "Name"
I have data that looks like this:
start_1 end_1 start_2 end_2 name
100 200 300 400 ABC
100 200 300 400 ABC
150 250 300 400 ABC
300 200 300 900 DEF
50 200 300 1000 DEF
The output should be like this:
start end name
100 400 ABC
50 1000 DEF
Looked into following answers already: Group by pandas Column
using-pandas-want-to-group-by-multiple-columns-for-min-max-and-add-another-colu
Looking forward for your asistance
Another possible solution:
d = df.set_index('name')
(pd.concat([
d.filter(like='start').groupby(level=0).min().min(axis=1).rename('start'),
d.filter(like='end').groupby(level=0).max().max(axis=1).rename('end')],
axis=1, names=['start', 'end'])
.reset_index())
This first sets the name
column as the index (set_index
), then uses concat
to combine two computed dataframes: one for the minimum start
values and another for the maximum end
values. The filter
method isolates columns containing start
or end
, and groupby
(with level=0
) groups by the index (name
). The inner min()
or max()
aggregates within groups, while the outer min(axis=1)
or max(axis=1)
computes the row-wise minimum/maximum across columns. Finally, reset_index
restores name
as a column.
We can avoid executing groupby
twice in the following way:
d = df.set_index('name')
col_funcs = {'start': 'min', 'end': 'max'}
(pd.concat(
[d.filter(like=k).agg(v, axis=1).rename(k) for k, v in col_funcs.items()],
axis=1)
.groupby(level=0).agg(col_funcs)
.reset_index())
Output:
name start end
0 ABC 100 400
1 DEF 50 1000