I'm working on Dataframe with pandas called filteredDS
The aim:
Searching for all data, whose question column contains 'King' word.
When I add the column king_quest via in operator like this:
filteredDS['king_quest'] = filteredDS.question.apply(lambda x: x if ' King ' in x else None).reset_index(drop = True)
filtered_king_df = filteredDS[~filteredDS.king_quest.isnull()].reset_index()
print(filtered_king_df)
I get dataframe with about 2000 rows, And when I add it via .find() function like this:
filteredDS['king_quest'] = filteredDS.question.apply(lambda x: x if x.find('king') else None).reset_index(drop = True)
filtered_king_df = filteredDS[~filteredDS.king_quest.isnull()].reset_index()
print(filtered_king_df)
I get dataframe with about 3000 rows.
Note: in both cases, each row in question column has the 'king' word.
Could you tell why is that happening?
There might be multiple issues here.
Your find is looking for different values in the statements. ' King ' (spaces, initial letter cap in one) and just 'king' in the other.
x.find('king') returns the index of the first matching and -1 otherwise. If you want to use this to check, you should probably check x.find('king') > 0
, but that is not as intuitive as 'king' in x