I am attempting to implement scikit-learn's GridSearchCV for Gaussian Process Regression (GPR). I'm using a small dataset of ~200 points, and would like to use LOOCV as a performance evaluator for my model. My setup is:
from sklearn.model_selection import *
from sklearn.ensemble import *
from sklearn.gaussian_process import *
param_grid = {
'kernel':[kernels.RBF(),kernels.Matern(length_scale=0.1)],
'n_restarts_optimizer':[5,10,20,25],
'random_state':[30]
}
res_GPR = GridSearchCV(estimator=GaussianProcessRegressor(),param_grid=param_grid,cv=LeaveOneOut(),verbose=20,n_jobs=-1)
res_GPR.fit(X,y)
where X and y are my data points and target values respectively. I know that the scoring method returned by GPR is r^2, which is undefinable for the LOOCV case (since there is only one test element) - this is verified by obtaining NaN for the .best_score_ attribute of the fitted model. As such, I would like the model to be scored with just the Root Mean Squared Error (RMSE) for each test case, averaged over all the iterations. How can I do that?
GridSearchCV
includes a scoring
argument, which you may use to set your score to negative RMSE:
res_GPR = GridSearchCV(estimator=GaussianProcessRegressor(),
param_grid=param_grid,
cv=LeaveOneOut(),
verbose=20,
n_jobs=-1,
scoring = 'neg_root_mean_squared_error')
See the documentation and the list of available scores for more.