I've done some searching and can't figure out how to filter a dataframe by
df["col"].str.contains(word)
however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of
!(df["col"].str.contains(word))
Can this be done through a DataFrame
method?
You can use the invert (~) operator (which acts like a not for boolean data):
new_df = df[~df["col"].str.contains(word)]
where new_df
is the copy returned by RHS.
contains also accepts a regular expression...
If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False
:
new_df = df[~df["col"].str.contains(word, na=False)]
Or,
new_df = df[df["col"].str.contains(word) == False]