pythonmachine-learningshapxai

SHAP values for linear model different from those calculated manually


I train a linear model to predict house price, and then I compare the Shapley values calculation manually vs the values returned by the SHAP library and they are slightly different.

My understanding is that for linear models the Shapley value is given by:

coeff * features for obs - coeffs * mean(features in training set)

Or as stated in the SHAP documentation: coef[i] * (x[i] - X.mean(0)[i]), where i is one feature.

The question is, why does SHAP return different values from the manual calculation?

Here is the code:

import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import shap

X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X = X.drop(columns = ["Latitude", "Longitude", "AveBedrms"])

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0,
)

scaler = MinMaxScaler().set_output(transform="pandas").fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

linreg = LinearRegression().fit(X_train, y_train)
coeffs = pd.Series(linreg.coef_, index=linreg.feature_names_in_)

X_test.reset_index(inplace=True, drop=True)
obs = 6188

# manual shapley calculation
effect = coeffs * X_test.loc[obs]
effect - coeffs * X_train.mean()

Which returns:

MedInc        0.123210
HouseAge     -0.459784
AveRooms     -0.128162
Population    0.032673
AveOccup     -0.001993
dtype: float64

And the SHAP library returns something slightly different:

explainer = shap.LinearExplainer(linreg, X_train)
shap_values = explainer(X_test)
shap_values[obs]

Here the result:

.values =
array([ 0.12039244, -0.47172515, -0.12767778,  0.03473923, -0.00251017])

.base_values =
2.0809714707337523

.data =
array([0.25094137, 0.01960784, 0.06056066, 0.07912217, 0.00437137])

It is set to ignore interactions:

explainer.feature_perturbation

returning

'interventional'

Solution

  • TL;DR:
    The definition of training set matters.

    Longer answer:

    Your understanding is right.

    However, what's not quite right is SHAP's hidden data transformations silently applied behind the scenes, which can be traced like this:

    import pandas as pd
    from sklearn.datasets import fetch_california_housing
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import MinMaxScaler
    import shap
    
    X, y = fetch_california_housing(return_X_y=True, as_frame=True)
    X = X.drop(columns = ["Latitude", "Longitude", "AveBedrms"])
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=0,
    )
    
    scaler = MinMaxScaler().set_output(transform="pandas").fit(X_train)
    X_train = scaler.transform(X_train)
    X_test = scaler.transform(X_test)
    
    linreg = LinearRegression().fit(X_train, y_train)
    coeffs = pd.Series(linreg.coef_, index=linreg.feature_names_in_)
    
    X_test.reset_index(inplace=True, drop=True)
    obs = 6188
    
    explainer = shap.LinearExplainer(linreg, X_test)
    shap_values = explainer(X_test)
    shap_values[obs]
    

    .values =
    array([ 0.15757575, -0.45065211, -0.12948118,  0.03568408, -0.00211654])
    
    .base_values =
    2.023180048641746
    
    .data =
    array([0.25094137, 0.01960784, 0.06056066, 0.07912217, 0.00437137])
    

    vs:

    y_train.mean()
    

    2.0682462451550387
    

    which is already alarming.

    To shed some light on what's going on:

    X_train.mean(0).values
    

    array([0.23218765, 0.54154317, 0.03475851, 0.03985979, 0.00382413])
    

    but

    explainer.masker.data.mean(0)
    

    array([0.22695687, 0.53117647, 0.03449285, 0.03624149, 0.00379029])
    

    which should hint you they applied masker and the masker data mean is surprisingly similar to what is actually used in SHAP values calculations (source code):

    explainer.mean
    

    array([0.22695687, 0.53117647, 0.03449285, 0.03624149, 0.00379029])
    

    So, to reconcile what you see as outcome to SHAP value calculations you should account for the use of masker:

    #expected result
    (X_test.loc[obs] - explainer.mean) * coeffs
    

    MedInc        0.157576
    HouseAge     -0.450652
    AveRooms     -0.129481
    Population    0.035684
    AveOccup     -0.002117
    dtype: float64
    

    or simply use less than 100 datapoints from the very beginning to avoid use of masker altogether.