nlpmachine-learningtext-analysissentiment-analysistraining-data

Training data for sentiment analysis


Where can I get a corpus of documents that have already been classified as positive/negative for sentiment in the corporate domain? I want a large corpus of documents that provide reviews for companies, like reviews of companies provided by analysts and media.

I find corpora that have reviews of products and movies. Is there a corpus for the business domain including reviews of companies, that match the language of business?


Solution

  • http://www.cs.cornell.edu/home/llee/data/

    http://mpqa.cs.pitt.edu/corpora/mpqa_corpus

    You can use twitter, with its smileys, like this: http://web.archive.org/web/20111119181304/http://deepthoughtinc.com/wp-content/uploads/2011/01/Twitter-as-a-Corpus-for-Sentiment-Analysis-and-Opinion-Mining.pdf

    Hope that gets you started. There's more in the literature, if you're interested in specific subtasks like negation, sentiment scope, etc.

    To get a focus on companies, you might pair a method with topic detection, or cheaply just a lot of mentions of a given company. Or you could get your data annotated by Mechanical Turkers.