I am trying to automatically generate keywords by using some machine learning algorithm. In that process, on the outcome I also see some unwanted keywords are also generated, and now I need to remove that unwanted/redundant words from the output column algorithmically. [unwanted keywords are nothing but, words that are not existing in the input column but still generated in the output column] Below is an example, I am trying to generate keywords by referring "query_text" column. The results are store in "auto generated keywords" column. But you see there are few keywords that are extracted unnecessarily ('diamond' and 'ring') and I highlighted the same in red color (in row 1 and row 3 respectively). Now in the final (corrected keywords) column, I have given only the necessary words.
How can I do this algorithmically by comparing the results (auto generated keywords) and the input (query_text).
S.No query_text auto generated keywords corrected keywords
1 I want ring diamond|ring ring
2 I want wedding band band|wedding band|wedding
3 I look for sapphire collection ring|sapphire sapphire
4 I want diamond earring diamond|earring diamond|earring
5 I am looking for stackable ring ring|stackable ring|stackable
6 I need gold bracelet bracelet|gold bracelet|gold
7 I look for gold ring gold|ring gold|ring
8 I need sapphire ring ring|sapphire ring|sapphire
Data with higlighted extra words:
You need to use a list comprehension on pairs of query
/auto generated words
(zip
), with a set
for efficient membership test:
df['corrected keywords'] = ['|'.join(w for w in l if w in S)
for S, l in zip(df['query_text'].apply(lambda x: set(x.split())),
df['auto generated keywords'].str.split('|'))]
Output:
S.No query_text auto generated keywords corrected keywords
0 1 I want ring diamond|ring ring
1 2 I want wedding band band|wedding band|wedding
2 3 I look for sapphire collection ring|sapphire sapphire
3 4 I want diamond earring diamond|earring diamond|earring
4 5 I am looking for stackable ring ring|stackable ring|stackable
5 6 I need gold bracelet bracelet|gold bracelet|gold
6 7 I look for gold ring gold|ring gold|ring
7 8 I need sapphire ring ring|sapphire ring|sapphire