nlpnltksentiment-analysislexicon

Best lexicons for sentence vs document level analysis


What are the best lexicons for document-level and sentence-level analysis? I'm using Vader currently for sentence-level analysis, however I'm worried that when I move to the document level, Vader may not perform as well as others.

Similar question to the post here, however more specific.


Solution

  • In addition to the sentiment lexica listed in the linked post, I can recommend aFinn sentiment lexicon.

    For sentiment analysis, depending on only lexica may not be be best solution, especially on document level. Language is so flexible that its attributes and notions other than sentiment-laden vocabulary effect semantics deeply.

    Some of the core notions are contrastive discource markers (especially for document level), negation and modality.

    There are opinions that have both pros and cons within documents and we tie those via those markers like 'however', 'nevertheless' etc. to convey meaning or an idea. For a bag of words approach, the sentences below are treated the same, yet if people to annotate their sentiment with one label, they may not annotate them with the same one:

    The laptop has amazing features, but its screen is killing me.
    The laptop's screen is killing me, but it has amazing features.
    

    In general, we evaluate these kind of sentences or paragraphs with the sentiment of the subclause after 'but'. Other contastive discource markers have their own semantics as well. This is inspected in an area called discource analysis.

    These notions change semantics as well. So, they cannot be overlooked for both levels. There are studies and papers those used negation and modality triggers with sentiment lexica. You can google it 'negation and modality on sentiment analysis' to see what you can do.

    Finally what I can suggest is if you have a domain-specific dataset, you may build your own lexicon using distant supervision.

    Hope this helps,

    Cheers