[SOLVED] grid_pipeline.fit uses default value of solver parameter instead of GridSearchCV value

grid_pipeline.fit uses default value of solver parameter instead of GridSearchCV value

I tried to find the best combination of hyperparameters for LogisticRegression in sklearn. Below is the example of my code:

pipeline = Pipeline([("scaler", StandardScaler()),
                     ("smt",    SMOTE(random_state=42)),
                     ("logreg", LogisticRegression())])


parameters = [{'logreg__solver': ['saga']},
              {'logreg__penalty':['l1', 'l2']},
              {'logreg__C':[1e-3, 0.1, 1, 10, 100]}]

grid_pipeline = GridSearchCV(pipeline,
                             parameters, 
                             scoring= 'f1', 
                             n_jobs=5, verbose=5,
                             return_train_score=True, 
                             cv=5) 

grid_result = grid_pipeline.fit(X_train,y_train)

During fitting I get the following error:

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

For some reason, default value 'lbfgs' is used for solver parameter instead of chosen 'saga'. Why does it happen?

Solution

I think the issue is how you have specified parameters. To get the desired behaviour, use a single dict as follows:

parameters = {'logreg__solver': ['saga'],
              'logreg__penalty':['l1', 'l2'],
              'logreg__C':[1e-3, 0.1, 1, 10, 100]
              }

You had specified it as a list of dicts, which gave GridSearchCV the option of picking some and ignoring others, meaning it sometimes encountered the request to use l1 on the default (non-saga) solver. Those two options are not compatible.