pythonscikit-learngridsearchcvscikit-learn-pipeline

Why is my double underscore notation not working with nested pipelines in scikit-learn?


I'm trying to build a pipeline that contains a pre-processing transformer (it simply removes columns from the data) and an LDA classifier. I wanted to tweak hyperparameters for each, and from looking at other posts and documentation I should just need to use pipelineName__param, with the double underscore, but that doesn't seem to be working.

pp.pprint(sorted(full_pipeline.get_params().keys()))

>>> [...] #lists all possible params for pipeline which I copied into param_grid
from sklearn.model_selection import GridSearchCV

clf_model  = LinearDiscriminantAnalysis()

full_pipeline = Pipeline([
    ('preprocessing', pp_pipeline),
    ('model', clf_model),
])
    
param_grid = {
    "preprocessing__dropper__drop_attr": [True, False],
    "model__solver": ["svd", "lsqr", "eigen"],
}


search = GridSearchCV(clf_model, param_grid, scoring="f1", return_train_score=True, cv=5, verbose=2, n_jobs=-1)
search.fit(X_train, y_train)

pp_pipeline is a pipeline that contains the transformer that drops columns, and a standard scaler. I have tested this on the X_train data alone and it works as expected.

The error the above code block throws up is

ValueError: Invalid parameter 'model' for estimator LinearDiscriminantAnalysis().

Why is it trying to treat model as a parameter and not the pipeline name, even though I've named it appropriately using a double underscore?

I've tried renaming model to something else, and even taking the "model__solver" out of param_grid entirely - if I do that, I instead get the error

ValueError: Invalid parameter 'preprocessing' for estimator LinearDiscriminantAnalysis().

so I must be missing something key here.


Solution

  • I believe the issue is that you are passing the model, lda to grid search, not your pipeline. Your code for the GridSearchCV should be:

    search = GridSearchCV(full_pipeline, param_grid, 
    scoring="f1", return_train_score=True, cv=5, verbose=2, n_jobs=-1