pythonnlpticket-system

How to calculate ticket classification after putting in a sentence? (Python/NLP)


I trained a model to classify tickets into 2 categories. I'm using GradientBoostClassifier. Now, I want to call on a function, where if I put any sentence in, the trained model would calculate the probability whether it will be category 1 or category 2. How do I write a code for this?

Let's imagine the sentence that I want to use is the ticket description: "Lab Research Assistant is trying to create a Clinical Activity Report"

def function(sentence):
    #split the sentence into different words
    Counter(" ".join(descr).split()).most_common
    #remove stop words in this sentence
    sentence.apply(remove_stopwords)

    return list(sentence)

ticket = function('Lab Research Assistant is trying to create a Clinical Activity Report')
ticket

model.predict(ticket)
model.predict_proba(ticket)

Thank you!


Solution

  • Attention: I'm giving this answer assuming you already have a model which classifies sentences and gives you an output since you have said "I trained a model to classify tickets into 2 categories".

    If you have a model which classifies the sentence already , There is no need to write another function to determine probability

    Because of classification is done based on final output, which is also a matrix of probabilities.

    For an example, take a case with two classes (just like yours). Then there are two out put nodes.

    node 0 --> class 1

    node 1 --> class 2

    If node 0 output is 0.943 then node 1 output will be (1-0.943). because of probability always add up to 1. The output matrix is [0.943,0.057]. The sentence is belonged to class 1. When the class is determined, probability is determined also, even before determining the class. You just have to get the scores. There must be already a function if you are using a 3rd party library. If you are building a model from scratch, Just add a line to print or return probability scores. very simple

    Edit:

    In the training session of the model

    countvectors = CountVectorizer(max_features = 1500)
    
    X = countvectors.fit_transform(df['CleanDescr'])
    

    A CountVectorizer has been used for turning training text data into vectors in the training session. You must use the same vectorizer (with same no of features) to convert sentences which you want to predict before passing them to the predictor.