machine-learningdeep-learningsentiment-analysis

how to build and label a non english dataset for sentiment analysis


lately I've started a new project about sentiment analysis and I should build a dataset in Persian language. while building a dataset is important for accuracy of whole process ,I want to do it as good as it's possible in shortest time. What is the most optimized way to build and label a sentiment analysis dataset?


Solution

  • You can use available dataset as a reference of yours. There are many sources to get sentiment analysis dataset:

    google

    sananalytics

    kaggle

    stanford

    Here is a list of datasets that give the sentiments for individual words.

    positivewordsresearch

    I suggest to you that work on mentioned datasets in order to increase your knowledge about dataset and their labels.

    Generally sentiment datasets uses limited labels such as "positive/negative" or "happy", "sad", "angry", and "neutral" or "anger", "sadness", "surprise", "fear", "disgust", and "joy"

    Hope to be useful for you.