it is my first time to use LIME and i have never used any interpretation technique before.
most likeley i am doing something wrong but i cannot figure out what is it.
I tried googling and go through SOF question to find the way to resolve this but did not find anything that can help me.
my dataset df_reps looks like this
Toyota Horse Toyota Gear... Mazda Night King
Green Mazda King Toyota ... Blue Mazda Toyota
...
...
Gear Tyre Toyota Geaer ... Horse Blue Park
Laptop Invoice Toyota ... Horse Mango Kitkat
and labels to predict, is whether the customer approved of not so the labels are only 0 and 1
Here is my code
def BOW(df):
CountVec = CountVectorizer() # to use only bigrams ngram_range=(2,2)
Count_data = CountVec.fit_transform(df)
Count_data = Count_data.astype(np.uint8)
cv_dataframe=pd.DataFrame(Count_data.toarray(), columns=CountVec.get_feature_names_out(), index=df.index) # <- HERE
return cv_dataframe.astype(np.uint8)
df = BOW(df_reps)
y = df_Labels # this is either 0 or 1
X = df
X_train, X_test, y_train, y_test = train_test_split(X, y)
clf = RandomForestClassifier(max_depth=100)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
I converted text into tabular format using BOW
therefore, i will using # Here is the part for LIME
explainer = LimeTabularExplainer(X_train.values, feature_names=X_train.columns, verbose=True, mode='classification')
exp = explainer.explain_instance(X_test.values[1], clf.predict, num_features=10000)
but i am getting this error
NotImplementedError: LIME does not currently support classifier models without probability scores. If this conflicts with your use case, please let us know: https://github.com/datascienceinc/lime/issues/16
The LimeTabularExplainer
requires probabilities, not predictions. So instead of passing clf.predict
you need to either pass clf.predict_proba
or a wrapper function that returns probabilities from features. For example based on this tutorial:
predict_fn = lambda x: rf.predict_proba(encoder.transform(x))
exp = explainer.explain_instance(X_test, predict_fn)