In the code provided below I am to visualize the results of SHAP values of a random forest model.
The code is in R and it is shown below:
# Load necessary libraries
library(randomForest)
library(DALEX)
library(beeswarm)
data <- my_database
# Splitting the data into features and target
features <- data[, -which(names(data) %in% "Clus.1")]
target <- data$Clus.1
# Train a random forest model
rf_model <- randomForest(features, target)
# Create an explainer object
explainer <- DALEX::explain(rf_model, data = features, y = target)
# Compute SHAP values
shapley_values <- DALEX::predict_parts(explainer, new_observation = features)
# Plot bee swarm
beeswarm(shapley_values$shap_1)
I have tried to use beeswarm package
and I ended up with this error:
beeswarm(shapley_values$shap_1)
Error in rep(nms, sapply(x, length)) : invalid 'times' argument
Can you please suggest me what is wrong about the beeswarm
, or other similiar packages?
Output of what I am trying to do
And this is the output I am getting if I use plot(shapley_values)
{DALEX} does not support plotting/working with SHAP values of multiple observations. Plotting SHAP beeswarm plots is easy with {shapviz}. Calculating SHAP values can done by different packages, e.g., {kernelshap}, {fastshap}, or {treeshap}.
Note that random forests are one of the worst for SHAP, because trees are deep and predictions are very slow.
library(randomForest)
library(kernelshap) # or library(treeshap)
library(shapviz)
fit <- randomForest(Sepal.Length ~ ., data = iris)
xvars <- setdiff(colnames(iris), "Sepal.Length")
# Or kernelshap() if length(xvars) is >10. Subsample bg_X to 100-500 rows
shap_values <- permshap(fit, X = iris, bg_X = iris, feature_names = xvars)
shap_values <- shapviz(shap_values)
sv_importance(shap_values, kind = "bee")