pythonnlpnltksentiment-analysisvader

NLTK Vader SentimentIntensityAnalyzer Bigram


For the VADER SentimentIntensityAnalyzer within Python, is there a way to add a bigram rule? I tried updating the lexicon with a two word input, but it did not change the polarity score. Thanks in advance!

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyser = SentimentIntensityAnalyzer()

#returns a compound score of -0.296
print(analyser.polarity_scores('no issues'))

analyser.lexicon['no issues'] = 0.0
#still returns a compound score of -0.296
print(analyser.polarity_scores('no issues'))

Solution

  • There is no straightforward way to add bigram to the vader lexicon. This is because vader considers individual tokens for sentiment analysis. However, one can do this using following steps:

    1. Create bigrams as tokens. For example, you can convert the bigram ("no issues") into a token ("noissues").
    2. Maintain a dictionary of polarity of the newly created tokens. {"noissues" : 2}
    3. Then perform additional text processing before passing the text for sentiment score calculation.

    Following code accomplishes the above:

    allowed_bigrams = {'noissues' : 2} #add more as per your requirement
        
    def process_text(text):
        tokens = text.lower().split() # list of tokens
        bigrams = list(nltk.bigrams(tokens)) # create bigrams as tuples of tokens
        bigrams = list(map(''.join, bigrams)) # join each word without space to create new bigram
        bigrams.append('...') # make length of tokens and bigrams list equal
         
        #begin recreating the text
        final = ''
        for i, token in enumerate(tokens):
            b = bigrams[i]
            
            if b in allowed_bigrams:
              join_word = b # replace the word in text by bigram
              tokens[i+1] = '' #skip the next word
            else:
                join_word = token
            final += join_word + ' '
        return final
    text  = 'Hello, I have no issues with you'
    print (text)
    print (analyser.polarity_scores(text))
    final = process_text(text)
    print (final)
    print(analyser.polarity_scores(final))
    

    The output :

    Hello, I have no issues with you
    {'neg': 0.268, 'neu': 0.732, 'pos': 0.0, 'compound': -0.296}
    hello, i have noissues  with you 
    {'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.4588}
    

    Notice in the output, how two words "no" and "issues" have been added together to form bigram "noissues".