pythonscikit-learngaussian-processskopt

Define kernel in scikit GaussianProcessRegressor using BayesSearchCV


Question: How do I define the kernel of a Gaussian Process Regressor using BayesSearchCV?

I'm trying to optimize hyperparameters in a gaussian process model using BayesSearchCV from skopt. It seems that I'm defining the kernel wrong and get a 'TypeError':

TypeError: Cannot clone object ''rbf'' (type <class 'str'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.

Dummy-Code:

from sklearn.datasets import make_regression
from sklearn.gaussian_process import GaussianProcessRegressor
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
from sklearn.gaussian_process.kernels import RBF, DotProduct, Matern

X,y = make_regression(100,10)

estimator = GaussianProcessRegressor()

param = {
    'kernel': ['rbf','matern'],
    'n_restarts_optimizer': (5,10),
    'alpha': (1e-5, 1e-2,'log-uniform')
}

opt = BayesSearchCV(
    estimator=estimator,
    search_spaces=param,
    cv=3,
    scoring="r2",
    random_state=42,
    n_iter=3,
    verbose=1,
)   

opt.fit(X, y)

Solution

  • First, GPR does not seem to support string aliased kernels, at least that holds for the current release. That raises another issue however, if you supply the kernel parameter with a constructor list, skopt is unable to process it (unhashable type). This is still a standing issue as far as I'm aware, though there's a proposed workaround at the bottom of the issue page.

    Another possible workaround is constructing different base estimators with a specific kernel:

    from sklearn.datasets import make_regression
    from sklearn.gaussian_process import GaussianProcessRegressor
    from skopt import BayesSearchCV
    from skopt.space import Real, Categorical, Integer
    from sklearn.gaussian_process.kernels import RBF, DotProduct, Matern
    from sklearn.pipeline import Pipeline
    
    X,y = make_regression(100,10)
    
    estimator_list = [GaussianProcessRegressor(kernel=RBF()),
                      GaussianProcessRegressor(kernel=Matern())]
    
    pipe=Pipeline([('estimator',GaussianProcessRegressor())])
    
    param = {
        'estimator': Categorical(estimator_list),
        'estimator__n_restarts_optimizer': (5,10),
        'estimator__alpha': (1e-5, 1e-2,'log-uniform')
    }
    
    opt = BayesSearchCV(
        estimator=pipe,
        search_spaces=param,
        cv=3,
        scoring="r2",
        random_state=42,
        n_iter=3,
        verbose=1,
    )   
    
    opt.fit(X, y)