pythonscikit-learncross-validationgridsearchcvauc

Why is the mean roc score from GridSearchCV using only 1 cv split, different from roc calculated with grid_search.score method or roc_auc_score func?


I was experimenting with sklearn's GridSearchCV, and I don't understand why the mean roc scores I get when using a single split defined with an iterable, are different than what I get running the score method after fitting, or the roc_auc_score function.

This is my data shape:

print(X.shape)
print(X.index)

print(y.shape)
print(y.index)
(31695, 1379)
RangeIndex(start=0, stop=31695, step=1)
(31695,)
RangeIndex(start=0, stop=31695, step=1)

This is how I define the cv_split:

cv_split =[(np.arange(15848), np.arange(15848,31695))]
cv_split
[(array([    0,     1,     2, ..., 15845, 15846, 15847]),
  array([15848, 15849, 15850, ..., 31692, 31693, 31694]))]

Fitting the model and resulting cv_results_:

gs_algorithm = GridSearchCV(estimator=LGBMClassifier(),
                            param_grid=hyperparameter_space,
                            scoring='roc_auc',
                            n_jobs=1,
                            pre_dispatch=1,
                            cv=cv_split,
                            verbose=10,
                            return_train_score=True)
gs_algorithm.fit(X, y)
gs_algorithm.cv_results_
Fitting 1 folds for each of 1 candidates, totalling 1 fits
...
{'mean_fit_time': array([17.40988088]),
 'std_fit_time': array([0.]),
 'mean_score_time': array([1.16691899]),
 'std_score_time': array([0.]),
 'param_colsample_bytree': masked_array(data=[0.2],
              mask=[False],
        fill_value='?',
             dtype=object),
 'param_learning_rate': masked_array(data=[0.1],
              mask=[False],
        fill_value='?',
             dtype=object),
 'param_max_depth': masked_array(data=[-1],
              mask=[False],
        fill_value='?',
             dtype=object),
 'param_min_child_samples': masked_array(data=[3000],
              mask=[False],
        fill_value='?',
             dtype=object),
 'param_min_child_weight': masked_array(data=[0],
              mask=[False],
        fill_value='?',
             dtype=object),
 'param_n_estimators': masked_array(data=[150],
              mask=[False],
        fill_value='?',
             dtype=object),
 'param_num_leaves': masked_array(data=[15000],
              mask=[False],
        fill_value='?',
             dtype=object),
 'param_random_state': masked_array(data=[6],
              mask=[False],
        fill_value='?',
             dtype=object),
 'params': [{'colsample_bytree': 0.2,
   'learning_rate': 0.1,
   'max_depth': -1,
   'min_child_samples': 3000,
   'min_child_weight': 0,
   'n_estimators': 150,
   'num_leaves': 15000,
   'random_state': 6}],
 'split0_test_score': array([0.75898716]),
 'mean_test_score': array([0.75898716]),
 'std_test_score': array([0.]),
 'rank_test_score': array([1], dtype=int32),
 'split0_train_score': array([0.81224109]),
 'mean_train_score': array([0.81224109]),
 'std_train_score': array([0.])}

So it's correctly giving me the same value for split0_test_score and mean_test_score: 0.75898716

But then when I try this:

gs_algorithm.score(X.iloc[cv_split[0][1]],y[cv_split[0][1]])
0.8194048788870386
y_pred = gs_algorithm.predict_proba(X)[:, 1]
print(y_pred[cv_split[0][1]].shape)

roc_auc_score(y[cv_split[0][1]], y_pred[cv_split[0][1]])
(15847,)
0.8194048788870386

Why is the mean_score informed after fitting the model different?


Solution

  • The score and predict_proba methods of GridSearchCV (your gs_algorithm) rely on a refitted model using the entire training set (recombining the cv split(s)); see the documentation for the parameter refit.

    Individual fold-estimator combinations aren't saved, so you would need to manually refit the estimator with the best_params_ on the training set (with random effects controlled) in order to recreate the test fold score.