pythonpandas

How to drop rows from pandas data frame that contains a particular string in a particular column?


I have a very large data frame in python and I want to drop all rows that have a particular string inside a particular column.

For example, I want to drop all rows which have the string "XYZ" as a substring in the column C of the data frame.

Can this be implemented in an efficient way using .drop() method?


Solution

  • pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:

    In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))
    
    In [92]: df
    Out[92]:
       A          C
    0  5        foo
    1  3        bar
    2  5  fooXYZbar
    3  6        bat
    
    In [93]: df[~df.C.str.contains("XYZ")]
    Out[93]:
       A    C
    0  5  foo
    1  3  bar
    3  6  bat