Using the following program which selects the rows with numbers included in 'Num' in a DataFrame.
import pandas as pd
data = {'Num': [[1,2,100], [10,20,30], [1,2,30],[1,2,200],[4,0,9]],'Id':range(5)}
df = pd.DataFrame(data)
numbers = [1,2]
filtered_df = df.loc[all(n in df['Num'] for n in numbers)]
print(filtered_df)
I have the following error:
raise KeyError(KeyError: 'True: boolean label can not be used without a boolean index'
I do not understand the reason for this error because if I change filtered_df by:
filtered_df = df.loc[df['Num'].apply(lambda c : all(n in c for n in numbers))]
The program works well. Can you please explain the error and how to correct the first program?
The all
function aggregates several boolean values into a single value. In your example, the input you are using in your first problem to get the rows in which your condition is true is:
all(n in df['Num'] for n in numbers)
> True
But you cannot get a dataframe index for a boolean.
In your second example, the lambda function is applied to each row, as per the definition of DataFrame.apply(...)
.