Please help me filter my dataframe for a condition, that should be fulfilled in any two columns. Imagine a list of students with grades in different sports. I want to filter the list so the new list passed_students shows only those who have scored a 4 or greater in at least two different sports.
Students = {
"Names": ["Tom", "Rick", "Sally","Sarah"],
"Football": [4, 5, 2,1],
"Basketball": [4, 2, 4,2],
"Volleyball": [6, 1, 6,1],
"Foosball": [4, 3, 4,3],
}
The Code should return this:
passed_Students = {
"Names": ["Tom", "Sally"],
"Football": [4,2],
"Basketball": [4,4],
"Volleyball": [6,6],
"Foosball": [4,4],
}
I can make it work if one grade above 4 is sufficient:
import numpy as np
import pandas as pd
Students = {
"Names": ["Tom", "Rick", "Sally","Sarah"],
"Football": [4, 5, 2,1],
"Basketball": [4, 2, 4,2],
"Volleyball": [6, 1, 6,1],
"Foosball": [4, 3, 4,3],
}
Students = pd.DataFrame(Students)
passed_Students= Students[(Students["Football"]>3) |(Students["Basketball"]>3)|(Students["Volleyball"]>3)|(Students["Foosball"]>3) ]
print(passed_Students)
This returns:
Students = {
"Names": ["Tom", "Rick", "Sally"],
"Football": [4, 5, 2],
"Basketball": [4, 2, 4],
"Volleyball": [6, 1, 6],
"Foosball": [4, 3, 4],
}
But how can I make it such, that any two grades of 4 or above qualify for passed_students, thereby returning only this?
passed_Students = {
"Names": ["Tom", "Sally"],
"Football": [4,2],
"Basketball": [4,4],
"Volleyball": [6,6],
"Foosball": [4,4],
}
drop
the "Names", then compare to 4
with ge
, sum
to count the number of True
per row and filter with boolean indexing:
passed_Students = Students[Students.drop(columns=['Names'])
.ge(4).sum(axis=1).ge(2)]
Output:
Names Football Basketball Volleyball Foosball
0 Tom 4 4 6 4
2 Sally 2 4 6 4
Intermediates:
# Students.drop(columns=['Names']).ge(4)
Football Basketball Volleyball Foosball
0 True True True True
1 True False False False
2 False True True True
3 False False False False
# Students.drop(columns=['Names']).ge(4).sum(axis=1)
0 4
1 1
2 3
3 0
dtype: int64
# Students.drop(columns=['Names']).ge(4).sum(axis=1).ge(2)
0 True
1 False
2 True
3 False
dtype: bool