pythontensorflowsentiment-analysisbert-language-modelroberta-language-model

Is it necessary to re-train BERT models, specifically RoBERTa model?


I am looking for a sentiment analysis code with atleast 80%+ accuracy. I tried Vader and it I found it easy and usable, however it was giving accuracy of 64% only.

Now, I was looking at some BERT models and I noticed it needs to be re-trained? Is that correct? Isn't it pre-trained? is re-training necessary?


Solution

  • You can use pre-trained models from HuggingFace. There are plenty to choose from. Search for emotion or sentiment models

    Here is an example of a model with 26 emotions. The current implementation works but is very slow for large datasets.

    import pandas as pd
    from transformers import RobertaTokenizerFast, TFRobertaForSequenceClassification, pipeline
    
    tokenizer = RobertaTokenizerFast.from_pretrained("arpanghoshal/EmoRoBERTa")
    model = TFRobertaForSequenceClassification.from_pretrained("arpanghoshal/EmoRoBERTa")
    
    
    emotion = pipeline('sentiment-analysis', 
                        model='arpanghoshal/EmoRoBERTa')
    
    # example data
    DATA_URI = "https://github.com/AFAgarap/ecommerce-reviews-analysis/raw/master/Womens%20Clothing%20E-Commerce%20Reviews.csv"
    dataf = pd.read_csv(DATA_URI, usecols=["Review Text",])
    
    # This is super slow, I will find a better optimization ASAP
    
    
    dataf = (dataf
             .head(50) # comment this out for the whole dataset
             .assign(Emotion = lambda d: (d["Review Text"]
                                           .fillna("")
                                           .map(lambda x: emotion(x)[0].get("label", None))
                                      ),
                 
                )
    )
    
    

    We could also refactor it a bit

    ...
    # a bit faster than the previous but still slow
    
    def emotion_func(text:str) -> str:
        if not text:
            return None
        return emotion(text)[0].get("label", None)
        
    
    
    
    dataf = (dataf
             .head(50) # comment this out for the whole dataset
             .assign(Emotion = lambda d: (d["Review Text"]
                                            .map(emotion_func)
                                         ),
    
                )
    )
    
    

    Results:

        Review Text Emotion
    0   Absolutely wonderful - silky and sexy and comf...   admiration
    1   Love this dress! it's sooo pretty. i happene... love
    2   I had such high hopes for this dress and reall...   fear
    3   I love, love, love this jumpsuit. it's fun, fl...   love
    ...
    6   I aded this in my basket at hte last mintue to...   admiration
    7   I ordered this in carbon for store pick up, an...   neutral
    8   I love this dress. i usually get an xs but it ...   love
    9   I'm 5"5' and 125 lbs. i ordered the s petite t...   love
    ...
    16  Material and color is nice. the leg opening i...    neutral
    17  Took a chance on this blouse and so glad i did...   admiration
    ...
    26  I have been waiting for this sweater coat to s...   excitement
    27  The colors weren't what i expected either. the...   disapproval
    ...
    31  I never would have given these pants a second ...   love
    32  These pants are even better in person. the onl...   disapproval
    33  I ordered this 3 months ago, and it finally ca...   disappointment
    34  This is such a neat dress. the color is great ...   admiration
    35  Wouldn't have given them a second look but tri...   love
    36  This is a comfortable skirt that can span seas...   approval
    ...
    40  Pretty and unique. great with jeans or i have ...   admiration
    41  This is a beautiful top. it's unique and not s...   admiration
    42  This poncho is so cute i love the plaid check ...   love
    43  First, this is thermal ,so naturally i didn't ...   love