pythonpandasstringselection

pandas string selection


I would like to extract rows containing a particular string - the string can be a part of a larger, space-separated string (which I would want to count in), or can be a part of another (continuous) string (which I would NOT want to count in). The string can be either at start, middle or end of the string value.

Example - say I would like to extract any row containing "HC":

df = pd.DataFrame(columns=['test'])
df['test'] = ['HC', 'CHC', 'HC RD', 'RD', 'MRD', 'CEA', 'CEA HC']

test
0   HC
1   CHC
2   HC RD
3   RD
4   MRD
5   CEA
6   CEA HC

Desired output

    test
0   HC
2   HC RD
6   CEA HC

Solution

  • You can use the str.contains method with the regex query \bHC\b

    >>> df[df['test'].str.contains(r'\bHC\b')]
         test
    0      HC
    2   HC RD
    6  CEA HC
    

    \b: Word boundary