pythonpandascontains

Search for "does-not-contain" on a DataFrame in pandas


I've done some searching and can't figure out how to filter a dataframe by

df["col"].str.contains(word)

however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of

!(df["col"].str.contains(word))

Can this be done through a DataFrame method?


Solution

  • You can use the invert (~) operator (which acts like a not for boolean data):

    new_df = df[~df["col"].str.contains(word)]
    

    where new_df is the copy returned by RHS.

    contains also accepts a regular expression...


    If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False:

    new_df = df[~df["col"].str.contains(word, na=False)]
    

    Or,

    new_df = df[df["col"].str.contains(word) == False]