I have a very large data frame in python and I want to drop all rows that have a particular string inside a particular column.
For example, I want to drop all rows which have the string "XYZ" as a substring in the column C of the data frame.
Can this be implemented in an efficient way using .drop() method?
pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:
In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))
In [92]: df
Out[92]:
A C
0 5 foo
1 3 bar
2 5 fooXYZbar
3 6 bat
In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
A C
0 5 foo
1 3 bar
3 6 bat