pythonscikit-learnrandom-forestgridsearchcvimblearn

pipeline for RandomOversampler, RandomForestClassifier & GridSearchCV


I am working on a binary text classification problem. As the classes are highly imbalanced, I am using sampling techniques like RandomOversampler(). Then for classification I would use RandomForestClassifier() whose parameters need to be tuned using GridSearchCV().

I am trying to create a pipeline to do these in order but failed so far. It throws invalid parameters.

param_grid = {
             'n_estimators': [5, 10, 15, 20],
             'max_depth': [2, 5, 7, 9]
         }
grid_pipe = make_pipeline(RandomOverSampler(),RandomForestClassifier())
grid_searcher = GridSearchCV(grid_pipe,param_grid,cv=10)
grid_searcher.fit(tfidf_train[predictors],tfidf_train[target])

Solution

  • The parameters you defined in the params is for RandomForestClassifier, but in the gridSearchCV, you are not passing a RandomForestClassifier object.

    You are passing a pipeline object, for which you have to rename the parameters to access the internal RandomForestClassifier object.

    Change them to:

    param_grid = {
                 'randomforestclassifier__n_estimators': [5, 10, 15, 20],
                 'randomforestclassifier__max_depth': [2, 5, 7, 9]
                 }
    

    And it will work.