pythonnlptext-classificationlime

NLP | LimeTextExplainer for bigrams


in my NLP task I want to understand the 'rule' of my classifier. For that purpose, I build a LimeTExtExplainer.

c= make_pipeline(cv,naive_bayes)
explainer = LimeTextExplainer(class_names=class_names, random_state=42, bow=False)
exp = explainer.explain_instance(X_test[i], c.predict_proba, num_features=20,) 
fig = exp.as_pyplot_figure()

The above code creats a nice list of 1grams, exactly as I wanted. :enter image description here

In a next step I want to do the same, but with bigrams. I changed the feature extractor to only calculate bigrams:

cv = CountVectorizer(strip_accents='ascii', analyzer='word',                                    
                 token_pattern=u'(?ui)\\b\\w*[a-z]+\\w*\\b',                                
                 lowercase=True, stop_words='english',                                      
                 ngram_range=(2,2), max_features=None)

The problem(s):

  1. I use the same code for the Limeexplainer as above. But now, the graph only shows 1grams as before, but I only calculated bigrams.
  2. As a side question, the horizontal axis of the graphs displays the absolute probability that the word accounts to the classification probability? For instance, the texts class X probabilty is 0.67, recognit accounts for ~ 0.009 and langugage for ~ 0.007 of the 0.67, right?

Thanks in advance!


Solution

  • At least I got an answer to the second question:

    Those are probabilities, but not in the way I thought.

    For instance, predicted probability for Class X is 0.808. If now the word 'recognit' would be removed from the underlying corpus, the total predicted probability for the predicted class would shrink by 0.008.--> probability class x equals 0.800 then.

    For detailed information about LIME I highly reccomend: “Why Should I Trust You?” Explaining the Predictions of Any Classifier, Riberio et.al (2016)