x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3,random_state=42)
rf_model= RandomForestClassifier()
rf_model.fit(x_train, y_train)
rf_pred = rf_model.predict(x_test)
import shap
rf_explainer = shap.TreeExplainer(rf_model, x_train)
rf_vals = rf_explainer.shap_values(x_train)
o/p: 100%|===================| 4778/4792 [03:26<00:00]
rf_explainer.expected_value
o/p: array([0.5763, 0.4237])
(Although with the summary plot, i understood what is the contribution of each feature to the model) (Please explain me what's this numbers in both the output means (4778/4792 and array([0.5763, 0.4237])))
rf_explainer.expected_value
are so called "base values", i.e. model's "expected" values over the whole dataset, which in turn means what a model would predict without knowledge of the data. These are close, but not equal exactly, to class frequencies.
When explaining a model's predictions: