pythonstringpandasexact-match

Python: Exact word match using a list and data frame


Hello Everyone :) I hope that you are well. I am new to python and have a problem to obtain an exact word match. I have a list of words key_list and I need to use this list to loop through a string dataframe df['response']to count the number of times a word from key_list appears in the data frame df['response'].

Currently, this is the code that I am using:

df['count_response']=df['response'].str.count('|'.join(key_list)) 

This is the output that I receive:

key_list:  ['honestli', 'know', 'realli', 'feel', 'wast', 'time', 'school', 'good', 'reason', 'go', 'colleg', 
'howev', 'wonder', 'whether', 'continu', 'cant', 'see', 'frankli', 'care', 'less', 'understand']
              response  count_response
0          parent said             0
1     want make differ             0
2            dont know             1
3                 rich             0
4       go career want             2
5              actuari             0
6          social life             0
7       expect societi             0
8                                  0
9           help peopl             0
10   realli love learn             1
11               money             0
12       passion field             0
13  happi learn econom             0
14   want uplift peopl             0

Unfortunately, this is not the correct output. In line 4 the count_response obtains a value of 2; however, in the key_list only the word "go" is present. I suspect that python is counting the word "care" (which is in the key_list) and it is within the word "career" but it should not be counting this word since I need an exact word match.

Thank you for your time, I appreciate any responses!


Solution

  • I think you need word boundaries by \b\b:

    df['count_response']=df['response'].str.count('|'.join(r"\b{}\b".format(x) for x in key_list))