python-3.xscikit-learnscoringgridsearchcv

GridSearchCV giving score from the best estimator different from the one indicated in refit parameter


I am doing hyperparameter optimization using GridSearchCV

scoring_functions = {'mcc': make_scorer(matthews_corrcoef), 'accuracy': make_scorer(accuracy_score), 'balanced_accuracy': make_scorer(balanced_accuracy_score)}

grid_search = GridSearchCV(pipeline, param_grid=grid, scoring=scoring_functions, n_jobs=-1, cv=splitter, refit='mcc')

I set the refit parameter to 'mcc' so I expect GridSearchCV to choose the best model to maximize this metric. Then I calculate some of the scores

preds = best_model.predict(test_df)
metrics['accuracy'] = round(accuracy_score(test_labels, preds),3)
metrics['balanced_accuracy'] = round(balanced_accuracy_score(test_labels, preds),3)
metrics['mcc'] = round(matthews_corrcoef(test_labels, preds),3)

And I get these results

"accuracy": 0.891, "balanced_accuracy": 0.723, "mcc": 0.871

Now if I do this to get the score of the model on the same test set (not calculating the predictions first) like this

best_model = grid_search.best_estimator_
score = best_model.score(test_df, test_labels)

The score I get is this

"score": 0.891

Which as you can see is the accuracy but not the mcc score. According to the documentation of the score function it says

Returns the score on the given data, if the estimator has been refit.

This uses the score defined by scoring where provided, and the best_estimator_.score method otherwise.

I don't understand correctly. I thought that if I refit the model like I specify with the refit parameter in GridSearchCV, the result should be with the scoring function used to refit the model? Am I missing something?


Solution

  • When you access the attribute best_estimator_ you are going to the underlying base model, ignoring all the set up you have done to the GridSearchCV object:

    best_model = grid_search.best_estimator_
    score = best_model.score(test_df, test_labels)
    

    You should use grid_search.score() instead and, in general, interact with that object. For example, when predicting, use grid_search.predict().

    The signature of those methods is the same as that of a standard Estimator (fit, predict, score, etc).

    You can use the underlying model, but it won't have necessarily inherited the configuration you have done to the grid search object itself.