I have a text file and 2 user defined positive and negative files. I'am comparing the words present the 2 files with the text file, and returning either positive or negative.
But i need to print those keywords in the text, which categorized them to either positive or negative.
example of the output i looking for:
file_name IBM Keywords Label
audio1.wav The customer is good good Positive
audio2.wav the service is bad bad Negative
Please let me know how to go about it. Here's the code so far
pos = readwords('C:\\Users\\anagha\\Desktop\\SynehackData\\positive.txt')
neg = readwords('C:\\Users\\anagha\\Desktop\\SynehackData\\Negative.txt')
pos = [w.lower() for w in pos]
neg = [w.lower() for w in neg]
def assign_comments_labels(x):
try:
if any(w in x for w in pos) :
return 'positive'
elif any(w in x for w in neg):
return 'negative'
else:
return 'neutral'
except:
return 'neutral'
import pandas as pd
df = pd.read_csv("C:\\Users\\anagha\\Desktop\\SynehackData\\noise_free_audio\\outputfile.csv", encoding="utf-8")
df['IBM'] = df['IBM'].str.lower()
df['file_name'] = df['file_name'].str.lower()
df['labels'] = df['IBM'].apply(lambda x: assign_comments_labels(x))
df[['file_name','IBM','labels']]
A good start would be to have the right indentation in the assign_comments_labels(x) function. Indent the whole body.
Edited answer:
Ok I get your question now;
This code should work for you based on the logic you used above:
def get_keyword(x):
x_ = x.split(" ")
try:
for word in x_:
if (word in neg) or (word in pos):
return word
except:
return -1
return -1
Then can use lambda as you did for labels:
df['keywords'] = df['IBM'].apply(lambda x: get_keyword(x))
Edit 2:
To return multiple keywords per sentence you can modify the code to return a list;
def get_keyword(x):
x_ = x.split(" ")
keywords = []
try:
for word in x_:
if (word in neg) or (word in pos):
keywords.append(word)
except:
return -1
return keywords
An even better solution would be to create two functions
And instead of one column for keywords in your DataFrame you will have two, one for pos and one for neg.
Usually texts would have both positive and negative keywords, however the weight of each word would classify the end result of the sentence as positive or negative. If this is your case then I highly recommend you implement the second solution.
Note:
For second solution change the if statement to
# For positive keywords function
if word in pos:
keywords.append(word)
# For negative keywords function
if word in neg:
keywords.append(word)
Hope that helps