pythonmachine-learningnlpsentiment-analysis

How to remove words from a sentence that carry no positive or negative sentiment?


Im trying a sentiment analysis based approach on youtube comments, but the comments many times have words like mrbeast, tiger/'s, lion/'s, pewdiepie, james, etc which do not add any feeling in the sentence. I've gone through nltk's average_perception_tagger but it didn't work well as it gave the results as

my input:

"mrbeast james lion tigers bad sad clickbait fight nice good"

words that i need in my sentence:

"bad sad clickbait fight nice good"

what i got using average_perception_tagger:

[('mrbeast', 'NN'),
 ('james', 'NNS'),
 ('lion', 'JJ'),
 ('tigers', 'NNS'),
 ('bad', 'JJ'),
 ('sad', 'JJ'),
 ('clickbait', 'NN'),
 ('fight', 'NN'),
 ('nice', 'RB'),
 ('good', 'JJ')]

so as you can see if i remove mrbeast i.e NN the words like clickbait, fight will also get removed which than ultimately remove expressions from that sentence.


Solution

  • There are multiple ways of doing this like

    1. you can create a set of positive and negative words and for each word in your grammar you can check if it exists in your set, if it does you should keep the word, else delete it. This however would first require all positive and negative words dataset.

    2. you can use something like textblob which can give you the sentiment score of a word or a sentence. so with a cutoff sentiment score you can filter out the words that you don't need.