pythonpandasfindin-operator

Difference between .find() and 'in' operator in python


I'm working on Dataframe with pandas called filteredDS

The aim:

Searching for all data, whose question column contains 'King' word.

When I add the column king_quest via in operator like this:

filteredDS['king_quest'] = filteredDS.question.apply(lambda x: x if ' King ' in x else None).reset_index(drop = True)
filtered_king_df = filteredDS[~filteredDS.king_quest.isnull()].reset_index()
print(filtered_king_df)

I get dataframe with about 2000 rows, And when I add it via .find() function like this:

filteredDS['king_quest'] = filteredDS.question.apply(lambda x: x if x.find('king') else None).reset_index(drop = True)
filtered_king_df = filteredDS[~filteredDS.king_quest.isnull()].reset_index()
print(filtered_king_df)

I get dataframe with about 3000 rows.

Note: in both cases, each row in question column has the 'king' word.

Could you tell why is that happening?


Solution

  • There might be multiple issues here.

    1. Your find is looking for different values in the statements. ' King ' (spaces, initial letter cap in one) and just 'king' in the other.

    2. x.find('king') returns the index of the first matching and -1 otherwise. If you want to use this to check, you should probably check x.find('king') > 0, but that is not as intuitive as 'king' in x