I would like to extract the MIN and MAX from multiple columns (start_1, end_1, start_2, end_2) group by "Name"
I have data that looks like this:
start_1 end_1 start_2 end_2 name
100 200 300 400 ABC
100 200 300 400 ABC
150 250 300 400 ABC
300 200 300 900 DEF
50 200 300 1000 DEF
The output should be like this:
start end name
100 400 ABC
50 1000 DEF
Looked into following answers already: Group by pandas Column
using-pandas-want-to-group-by-multiple-columns-for-min-max-and-add-another-colu
Looking forward for your asistance
Use wide_to_long
for reshape and then aggregate min
and max
:
out = (pd.wide_to_long(df.reset_index(),
stubnames=['start','end'],
i=['index', 'name'],
j=' ',
sep="_")
.groupby('name')
.agg(start=('start', 'min'), end=('end', 'max'))
.reset_index())
print (out)
name start end
0 ABC 100 400
1 DEF 50 1000
How it working:
print(pd.wide_to_long(df.reset_index(),
stubnames=['start','end'],
i=['index', 'name'],
j=' ',
sep="_"))
start end
index name
0 ABC 1 100 200
2 300 400
1 ABC 1 100 200
2 300 400
2 ABC 1 150 250
2 300 400
3 DEF 1 300 200
2 300 900
4 DEF 1 50 200
2 300 1000
Another idea is create minimal and maximal columns first and then aggregate:
out = (df.assign(start = df.filter(like='start').min(axis=1),
end = df.filter(like='end').max(axis=1))
.groupby('name')
.agg(start=('start', 'min'), end=('end', 'max'))
.reset_index())
print (out)
name start end
0 ABC 100 400
1 DEF 50 1000
How it working:
print(df.assign(start = df.filter(like='start').min(axis=1),
end = df.filter(like='end').max(axis=1)))
start_1 end_1 start_2 end_2 name start end
0 100 200 300 400 ABC 100 400
1 100 200 300 400 ABC 100 400
2 150 250 300 400 ABC 150 400
3 300 200 300 900 DEF 300 900
4 50 200 300 1000 DEF 50 1000