pythonpandas

Select rows if a condition is met in any two columns


Please help me filter my dataframe for a condition, that should be fulfilled in any two columns. Imagine a list of students with grades in different sports. I want to filter the list so the new list passed_students shows only those who have scored a 4 or greater in at least two different sports.

Students = {
  "Names": ["Tom", "Rick", "Sally","Sarah"],
  "Football": [4, 5, 2,1],
  "Basketball": [4, 2, 4,2],
  "Volleyball": [6, 1, 6,1],
  "Foosball": [4, 3, 4,3],
}

The Code should return this:

passed_Students = {
  "Names": ["Tom", "Sally"],
  "Football": [4,2],
  "Basketball": [4,4],
  "Volleyball": [6,6],
  "Foosball": [4,4],
}

I can make it work if one grade above 4 is sufficient:

import numpy as np
import pandas as pd


Students = {
  "Names": ["Tom", "Rick", "Sally","Sarah"],
  "Football": [4, 5, 2,1],
  "Basketball": [4, 2, 4,2],
  "Volleyball": [6, 1, 6,1],
  "Foosball": [4, 3, 4,3],
}
Students = pd.DataFrame(Students)


passed_Students= Students[(Students["Football"]>3) |(Students["Basketball"]>3)|(Students["Volleyball"]>3)|(Students["Foosball"]>3) ]

print(passed_Students)

This returns:

Students = {
  "Names": ["Tom", "Rick", "Sally"],
  "Football": [4, 5, 2],
  "Basketball": [4, 2, 4],
  "Volleyball": [6, 1, 6],
  "Foosball": [4, 3, 4],
}

But how can I make it such, that any two grades of 4 or above qualify for passed_students, thereby returning only this?

passed_Students = {
  "Names": ["Tom", "Sally"],
  "Football": [4,2],
  "Basketball": [4,4],
  "Volleyball": [6,6],
  "Foosball": [4,4],
}

Solution

  • drop the "Names", then compare to 4 with ge, sum to count the number of True per row and filter with boolean indexing:

    passed_Students = Students[Students.drop(columns=['Names'])
                                       .ge(4).sum(axis=1).ge(2)]
    

    Output:

       Names  Football  Basketball  Volleyball  Foosball
    0    Tom         4           4           6         4
    2  Sally         2           4           6         4
    

    Intermediates:

    # Students.drop(columns=['Names']).ge(4)
       Football  Basketball  Volleyball  Foosball
    0      True        True        True      True
    1      True       False       False     False
    2     False        True        True      True
    3     False       False       False     False
    
    # Students.drop(columns=['Names']).ge(4).sum(axis=1)
    0    4
    1    1
    2    3
    3    0
    dtype: int64
    
    # Students.drop(columns=['Names']).ge(4).sum(axis=1).ge(2)
    0     True
    1    False
    2     True
    3    False
    dtype: bool