pandassubstringapplycontainsisin

Is there a better way to check for each element in a dataframe that it is contained in a given string?


Let's say we have a dataframe df representing the activities of some people as follow:

index Mary Tristan Louise Arnaud Justin Stacy
0 Engineer Software Engineer Rock Singer Rap Singer Lumberjack Biomedical Engineer
1 Guitarist Aerospace Engineer Author Figherfighter
2 Business Man

And I would like to check if each activity is or might be software engineering. With s = 'Software Engineer', we would obtain:

index Mary Tristan Louise Arnaud Justin Stacy
0 True True False False False False
1 False False False False False False
2 False False False False False False

Which mean that I want to test for all cells in df that they are or are not a substring of s. What already works is the following, but it looks dirty:

s = 'Software Engineer'
df.apply(lambda col: col.apply(lambda x: str(x) in s))

What bothers me is the double apply, there might be a better solution right?


Solution

  • To check every cell in your dataframe if it is a substring of s no need to numpy, you can use applymap :

    df.applymap(lambda cell: bool(cell) and cell in s)
    
    

    Note: bool(cell) is used to exclude empty and NaN cells and mark them as False.

    Also if you want the other way around, ie. check if s is a substring of each cell, you can use vectorized string functions to further optimize your code:

    df.apply(lambda column: column.str.contains(s))