pythonpandasnumpydataframeboolean

Filtering pandas dataframe with multiple Boolean columns


I am trying to filter a df using several Boolean variables that are a part of the df, but have been unable to do so.

Sample data:

A | B | C | D
John Doe | 45 | True | False
Jane Smith | 32 | False | False
Alan Holmes | 55 | False | True
Eric Lamar | 29 | True | True

The dtype for columns C and D is Boolean. I want to create a new df (df1) with only the rows where either C or D is True. It should look like this:

A | B | C | D
John Doe | 45 | True | False
Alan Holmes | 55 | False | True
Eric Lamar | 29 | True | True

I've tried something like this, which faces issues because it cant handle the Boolean type:

df1 = df[(df['C']=='True') or (df['D']=='True')]

Any ideas?


Solution

  • In [82]: d
    Out[82]:
                 A   B      C      D
    0     John Doe  45   True  False
    1   Jane Smith  32  False  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    Solution 1:

    In [83]: d.loc[d.C | d.D]
    Out[83]:
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    Solution 2:

    In [94]: d[d[['C','D']].any(1)]
    Out[94]:
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    Solution 3:

    In [95]: d.query("C or D")
    Out[95]:
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    PS If you change your solution to:

    df[(df['C']==True) | (df['D']==True)]
    

    it'll work too

    Pandas docs - boolean indexing


    The following example shows why we should NOT use "PEP complaint" df["col_name"] is True instead of df["col_name"] == True (issue mentioned in comments).

    In [11]: df = pd.DataFrame({"col":[True, True, True]})
    
    In [12]: df
    Out[12]:
        col
    0  True
    1  True
    2  True
    
    In [13]: df["col"] is True
    Out[13]: False               # <----- oops, that's not exactly what we wanted