pythonmachine-learningrandom-forestshappredictive

I'm trying to understand the shap values of a predictive model below. Please help me understand what's the o/p of value & explainer means?


x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3,random_state=42)
rf_model= RandomForestClassifier()
rf_model.fit(x_train, y_train)
rf_pred = rf_model.predict(x_test)


import shap
rf_explainer = shap.TreeExplainer(rf_model, x_train)

rf_vals = rf_explainer.shap_values(x_train)

o/p: 100%|===================| 4778/4792 [03:26<00:00]

rf_explainer.expected_value

o/p: array([0.5763, 0.4237])

(Although with the summary plot, i understood what is the contribution of each feature to the model) (Please explain me what's this numbers in both the output means (4778/4792 and array([0.5763, 0.4237])))


Solution

  • rf_explainer.expected_value are so called "base values", i.e. model's "expected" values over the whole dataset, which in turn means what a model would predict without knowledge of the data. These are close, but not equal exactly, to class frequencies.

    When explaining a model's predictions:

    1. You start with base values, which are the same for all data points (over the supplied background dataset).
    2. Add SHAP values on top of them to arrive at actual model predictions. SHAP values will show particular feature contributions to the prediction of interest.