pythonnlpnltk

Use Natural Language Processing to to Split Bad & Good Comments from an Employee Survey


So bit of a long shot here, and I apologize for the lack of information. However, I'm struggling to even know where to look now.

So I'm trying to split good and bad comments from a made-up survey of employees at a random company. All I have is a dataframe consisting of the comment an employee has made along with their managers ID code. The idea is to try and see how many good and/or bad comments are associated with a manager via their ID.

import pandas as pd 
trial_text=pd.read_csv("trial.csv")
trial_text.head()

   ManagerCode              Comment
0        AB123  Great place to work
1        AB123  Need more training
2        AB123  Hate working here
3        AB124  Always late home
4        AB124  Manager never listens

I've used NLTK quite a lot for data sets that include a lot more information so anything NLTK based won't be a problem. Like I say, with what I have, "Google" has far too much information that I don't know where to begin (or that is useful)! If there's anyone that might just have a suggestion that could put me on track that would be great!

Thanks


Solution

  • You need sentiment analysis. I don't think you will get amazing results with an off-the-shelf model though, because your responses are quite short and quite domain specific. In case you want to try anyway, here is an example of how to use the vader model with nltk:

    from nltk.sentiment.vader import SentimentIntensityAnalyzer
    sid = SentimentIntensityAnalyzer()
    sid.polarity_scores('Great place to work')
    >>> {'neg': 0.0, 'neu': 0.423, 'pos': 0.577, 'compound': 0.6249}
    sid.polarity_scores('Manager never listens')
    >>> {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
    

    As you can see, your mileage may vary.

    If you have lots of responses (thousands), a more viable strategy would be to manually label a sample of e.g. a few tens to a few hundred and to train your own sentiment classifier. Here are some good tutorials of how to do this with either nltk or sklearn