pythonstringpython-3.xmatchsearch-keywords

how to print the matched words in python


I have a text file and 2 user defined positive and negative files. I'am comparing the words present the 2 files with the text file, and returning either positive or negative.

But i need to print those keywords in the text, which categorized them to either positive or negative.

example of the output i looking for:

file_name       IBM                         Keywords     Label

audio1.wav     The customer is good         good         Positive
audio2.wav     the service is bad           bad          Negative

Please let me know how to go about it. Here's the code so far

pos = readwords('C:\\Users\\anagha\\Desktop\\SynehackData\\positive.txt')
neg = readwords('C:\\Users\\anagha\\Desktop\\SynehackData\\Negative.txt')

pos = [w.lower() for w in pos]
neg = [w.lower() for w in neg]

def assign_comments_labels(x):
    try:
        if any(w in x for w in pos) :      
            return 'positive'
        elif any(w in x for w in neg):
            return 'negative'
        else:
            return 'neutral'
    except:
        return 'neutral'

import pandas as pd
df = pd.read_csv("C:\\Users\\anagha\\Desktop\\SynehackData\\noise_free_audio\\outputfile.csv", encoding="utf-8") 

df['IBM'] = df['IBM'].str.lower()
df['file_name'] = df['file_name'].str.lower()

df['labels'] = df['IBM'].apply(lambda x: assign_comments_labels(x))

df[['file_name','IBM','labels']] 

Solution

  • A good start would be to have the right indentation in the assign_comments_labels(x) function. Indent the whole body.

    Edited answer:
    Ok I get your question now;

    This code should work for you based on the logic you used above:

    def get_keyword(x):
       x_ = x.split(" ")
       try:
          for word in x_:
             if (word in neg) or (word in pos):
                return word
       except:
          return -1
    
       return -1
    

    Then can use lambda as you did for labels:

    df['keywords'] = df['IBM'].apply(lambda x: get_keyword(x))
    

    Edit 2:
    To return multiple keywords per sentence you can modify the code to return a list;

    def get_keyword(x):
       x_ = x.split(" ")
       keywords = []
       try:
          for word in x_:
             if (word in neg) or (word in pos):
                keywords.append(word)
       except:
          return -1
    
       return keywords
    

    An even better solution would be to create two functions

    And instead of one column for keywords in your DataFrame you will have two, one for pos and one for neg.

    Usually texts would have both positive and negative keywords, however the weight of each word would classify the end result of the sentence as positive or negative. If this is your case then I highly recommend you implement the second solution.

    Note:
    For second solution change the if statement to

    # For positive keywords function    
    if word in pos:
        keywords.append(word)
    
    # For negative keywords function
    if word in neg:
        keywords.append(word)
    

    Hope that helps