I would like to extract rows containing a particular string - the string can be a part of a larger, space-separated string (which I would want to count in), or can be a part of another (continuous) string (which I would NOT want to count in). The string can be either at start, middle or end of the string value.
Example - say I would like to extract any row containing "HC":
df = pd.DataFrame(columns=['test'])
df['test'] = ['HC', 'CHC', 'HC RD', 'RD', 'MRD', 'CEA', 'CEA HC']
test
0 HC
1 CHC
2 HC RD
3 RD
4 MRD
5 CEA
6 CEA HC
Desired output
test
0 HC
2 HC RD
6 CEA HC
You can use the str.contains
method with the regex query \bHC\b
>>> df[df['test'].str.contains(r'\bHC\b')]
test
0 HC
2 HC RD
6 CEA HC
\b
: Word boundary