pandasdataframeindexingkeyerror

Pandas: KeyError using df.loc with condition


Using the following program which selects the rows with numbers included in 'Num' in a DataFrame.

import pandas as pd 
data = {'Num': [[1,2,100], [10,20,30], [1,2,30],[1,2,200],[4,0,9]],'Id':range(5)}
df = pd.DataFrame(data)
numbers = [1,2]
filtered_df = df.loc[all(n in df['Num'] for n in numbers)]
print(filtered_df)

I have the following error:

raise KeyError(KeyError: 'True: boolean label can not be used without a boolean index'

I do not understand the reason for this error because if I change filtered_df by:

filtered_df = df.loc[df['Num'].apply(lambda c : all(n in c for n in numbers))]

The program works well. Can you please explain the error and how to correct the first program?


Solution

  • The all function aggregates several boolean values into a single value. In your example, the input you are using in your first problem to get the rows in which your condition is true is:

    all(n in df['Num'] for n in numbers)
    > True
    

    But you cannot get a dataframe index for a boolean. In your second example, the lambda function is applied to each row, as per the definition of DataFrame.apply(...).