pythonmachine-learningscikit-learnregressioncross-validation

Implementing GridSearchCV with scorer for Leave One Out Cross-Validation


I am attempting to implement scikit-learn's GridSearchCV for Gaussian Process Regression (GPR). I'm using a small dataset of ~200 points, and would like to use LOOCV as a performance evaluator for my model. My setup is:

from sklearn.model_selection import *
from sklearn.ensemble import *
from sklearn.gaussian_process import *

param_grid = {
    'kernel':[kernels.RBF(),kernels.Matern(length_scale=0.1)],
    'n_restarts_optimizer':[5,10,20,25],
    'random_state':[30]
}
res_GPR = GridSearchCV(estimator=GaussianProcessRegressor(),param_grid=param_grid,cv=LeaveOneOut(),verbose=20,n_jobs=-1)
res_GPR.fit(X,y)

where X and y are my data points and target values respectively. I know that the scoring method returned by GPR is r^2, which is undefinable for the LOOCV case (since there is only one test element) - this is verified by obtaining NaN for the .best_score_ attribute of the fitted model. As such, I would like the model to be scored with just the Root Mean Squared Error (RMSE) for each test case, averaged over all the iterations. How can I do that?


Solution

  • GridSearchCV includes a scoring argument, which you may use to set your score to negative RMSE:

    res_GPR = GridSearchCV(estimator=GaussianProcessRegressor(),
                           param_grid=param_grid,
                           cv=LeaveOneOut(),
                           verbose=20,
                           n_jobs=-1, 
                           scoring = 'neg_root_mean_squared_error')
    

    See the documentation and the list of available scores for more.