pythonnlpbert-language-modelroberta-language-model

Dutch sentiment analysis RobBERTje outputs just positive/negative labels, netural label is missing


When I run Dutch sentiment analysis RobBERTje, it outputs just positive/negative labels, netural label is missing in the data.

https://huggingface.co/DTAI-KULeuven/robbert-v2-dutch-sentiment

There are obvious neutral sentences/words e.g. 'Fhdf' (nonsense) and 'Als gisteren inclusief blauw' (neutral), but they both evaluate to positive or negative.

Is there a way to get neutral labels for such examples in RobBERTje?

from transformers import RobertaTokenizer, RobertaForSequenceClassification
from transformers import pipeline
import torch

model_name = "DTAI-KULeuven/robbert-v2-dutch-sentiment"
model = RobertaForSequenceClassification.from_pretrained(model_name)
tokenizer = RobertaTokenizer.from_pretrained(model_name)

classifier = pipeline('sentiment-analysis', model=model, tokenizer = tokenizer)

result1 = classifier('Fhdf')
result2 = classifier('Als gisteren inclusief blauw')
print(result1)
print(result2)

Output:

[{'label': 'Positive', 'score': 0.7520257234573364}]
[{'label': 'Negative', 'score': 0.7538396120071411}]

Solution

  • This model was trained only on negative and positive labels. Therefore, it will try to categorize every input as positive or negative, even if it is nonsensical or neutral.

    what you can do is to: 1- Find other models that was trained to include neutral label. 2- Fine-tune this model on a dataset that includes neutral label. 3- Empirically define a threshold based on the confidence outputs and interpret it as neutral.

    The first 2 choices are extensive in effort. I would suggest you go with the third option for a quick workaround. Try feeding the model with a few neutral input and observe the range of confidence score in the output. then use that threshold to classify as neutral.

    Here's a sample:

    def classify_with_neutral(text, threshold=0.5):
        result = classifier(text)[0]  # Get the classification result
        if result['score'] < threshold:
            result['label'] = 'Neutral'  # Override label to 'Neutral'
        return result