pythonscikit-learnnaivebayes

Figure out which words a Naive Bayes classificator uses for deciding


I'm doing text classification with Naive Bayes in Python and want to figure out which words are used for deciding to what class a text belongs.

I have found this answer https://stackoverflow.com/a/62097661/3992979, but it doesn't help me as my vectorizer doesn't have a get_feature_names() method and my Naive Bayes classifier no coef_ attribute.

df_train is a data frame with manually labelled training data df_test is a data frame with unlabelled data NB should classify. There are two classes only, "terror" 1 for text about terrorism attacks and "terror" 0 for text without that topic.

### Create "Bag of Words"
vec = CountVectorizer(
    ngram_range=(1, 3)
)

x_train = vec.fit_transform(df_train.clean_text)
x_test = vec.transform(df_test.clean_text)

y_train = df_train.terror
y_test = df_test.terror

### Train and evaluate the model (Naive Bayes classification)
nb = MultinomialNB()
nb.fit(x_train, y_train)

preds = nb.predict(x_test)

Solution

  • I figured it out with trial-and-error:

    ### Get the words that trigger the AI detection
    features_log_prob = nb.feature_log_prob_
    feature_names = vec.get_feature_names_out()
    
    def show_top100(classifier, vectorizer, categories):
      feature_names = vectorizer.get_feature_names_out()
      for i, category in enumerate(categories):
        top100 = np.argsort(classifier.feature_log_prob_[i])[-100:]
        print("%s: %s" % (category, " ".join(feature_names[top100])))
    
    show_top100(nb, vec, nb.classes_)