from imblearn.pipeline import Pipeline
from imblearn.over_sampling import SMOTE
smt = SMOTE(random_state=0)
pipeline_rf_smt_fs = Pipeline(
[
('preprocess',preprocessor),
('selector', SelectKBest(mutual_info_classif, k=30)),
('smote',smt),
('rf_classifier',RandomForestClassifier(n_estimators=600, random_state =2021))
]
)
i am getting below error: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'SMOTE(random_state=0)' (type <class 'imblearn.over_sampling._smote.SMOTE'>) doesn't
I believe smote has to be use post feature selection process. Any help on this would be very helpful.
This is the error message given by scikit-learn
's version of the pipeline. Your code, as is, should not produce this error, but you probably have run from sklearn.pipeline import Pipeline
somewhere which has overwritten the Pipeline
object.
From a methodological point of view, I nonetheless find it questionable to use a sampler after the preprocessing and feature selection in a general setting. What if the features you select are relevant because of the imbalance in your dataset? I would prefer using it in the first step of a pipeline (but this is up to you, it should not cause any errors).