Hello Everyone :) I hope that you are well.
I am new to python and have a problem to obtain an exact word match. I have a list of words key_list
and I need to use this list to loop through a string dataframe df['response']
to count the number of times a word from key_list
appears in the data frame df['response']
.
Currently, this is the code that I am using:
df['count_response']=df['response'].str.count('|'.join(key_list))
This is the output that I receive:
key_list: ['honestli', 'know', 'realli', 'feel', 'wast', 'time', 'school', 'good', 'reason', 'go', 'colleg',
'howev', 'wonder', 'whether', 'continu', 'cant', 'see', 'frankli', 'care', 'less', 'understand']
response count_response
0 parent said 0
1 want make differ 0
2 dont know 1
3 rich 0
4 go career want 2
5 actuari 0
6 social life 0
7 expect societi 0
8 0
9 help peopl 0
10 realli love learn 1
11 money 0
12 passion field 0
13 happi learn econom 0
14 want uplift peopl 0
Unfortunately, this is not the correct output. In line 4 the count_response obtains a value of 2; however, in the key_list
only the word "go" is present. I suspect that python is counting the word "care" (which is in the key_list
) and it is within the word "career" but it should not be counting this word since I need an exact word match.
Thank you for your time, I appreciate any responses!
I think you need word boundaries by \b\b
:
df['count_response']=df['response'].str.count('|'.join(r"\b{}\b".format(x) for x in key_list))