scikit-learnshapgaussian-process

SHAP values for Gaussian Processes Regressor are zero


I am trying to get SHAP values for a Gaussian Processes Regression (GPR) model using SHAP library. However, all SHAP values are zero. I am using the example in the official documentation. I only changed the model to GPR.

import sklearn
from sklearn.model_selection import train_test_split
import numpy as np
import shap
import time
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, WhiteKernel, ConstantKernel

shap.initjs()

X,y = shap.datasets.diabetes()
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# rather than use the whole training set to estimate expected values, we summarize with
# a set of weighted kmeans, each weighted by the number of points they represent.
X_train_summary = shap.kmeans(X_train, 10)


kernel = Matern(length_scale=2, nu=3/2) + WhiteKernel(noise_level=1)   

gp = GaussianProcessRegressor(kernel)
gp.fit(X_train, y_train)

# explain all the predictions in the test set
explainer = shap.KernelExplainer(gp.predict, X_train_summary)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)

Running the above code gives the following plot:

enter image description here

When I use Neural Network or Linear Regression, the above code works fine without problem.
If you have any idea how to solve this issue, please let me know.


Solution

  • Your model doesn't predict anything:

    plt.scatter(y_test, gp.predict(X_test));
    

    enter image description here

    Train your model properly, like below:

    plt.scatter(y_test, gp.predict(X_test));
    

    enter image description here

    and you're fine to go:

    explainer = shap.KernelExplainer(gp.predict, X_train_summary)
    shap_values = explainer.shap_values(X_test)
    shap.summary_plot(shap_values, X_test)
    

    enter image description here

    Full reproducible example:

    import sklearn
    from sklearn.model_selection import train_test_split
    import numpy as np
    import shap
    import time
    from sklearn.gaussian_process import GaussianProcessRegressor
    from sklearn.gaussian_process.kernels import WhiteKernel, DotProduct
    
    X,y = shap.datasets.diabetes()
    X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=0)
    X_train_summary = shap.kmeans(X_train, 10)
    kernel = DotProduct() + WhiteKernel()
    
    gp = GaussianProcessRegressor(kernel)
    gp.fit(X_train, y_train)
    
    explainer = shap.KernelExplainer(gp.predict, X_train_summary)
    shap_values = explainer.shap_values(X_test)
    shap.summary_plot(shap_values, X_test)