I have an imbalanced classification problem and I am using make_pipeline
from imblearn
So the steps are the following:
kf = StratifiedKFold(n_splits=10, random_state=42, shuffle=True)
params = {
'max_depth': [2,3,5],
# 'max_features':['auto', 'sqrt', 'log2'],
# 'min_samples_leaf': [5,10,20,50,100,200,300],
'n_estimators': [10,25,30,50]
# 'bootstrap': [True, False]
}
from imblearn.pipeline import make_pipeline
imba_pipeline = make_pipeline(SMOTE(random_state = 42), RobustScaler(), RandomForestClassifier(random_state=42))
imba_pipeline
out:Pipeline(steps=[('smote', SMOTE(random_state=42)),
('robustscaler', RobustScaler()),
('randomforestclassifier',
RandomForestClassifier(random_state=42))])
new_params = {'randomforestclassifier__' + key: params[key] for key in params}
grid_imba = GridSearchCV(imba_pipeline, param_grid=new_params, cv=kf, scoring='recall',
return_train_score=True, n_jobs=-1, verbose=2)
grid_imba.fit(X_train, y_train)
And everything is going ok and I am reaching to the end to by problem (i.e I can see the classification report)
However when I am trying to see inside the black box with eli5
with eli.explain_weights(imba_pipeline)
I get back as error
TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'SMOTE(random_state=42)' (type <class 'imblearn.over_sampling._smote.SMOTE'>) doesn't
I know that this Is a common problem and i have read the related questions but i am confused as the problem is occurred after the end of my classification procedure
Any suggestions?
Your pipeline has two fitted steps (+ the scaler): the SMOTE augmentation and the random forest. It looks like this is confusing the eli5 which wants to work with the assumptions that only the last layer is fitted. To get the weight explanation of the random forest you could try calling eli5
only on that layer of the pipeline with
from eli5 import explain_weights
explain_weights(imba_pipeline['randomforestclassifier'])
provided the pipeline is fitted, but in your code you were fitting the grid search so
explain_weights(grid_imba.best_estimator_['randomforestclassifier'])
would be more appropriate.