rensemble-learningshapiris-datasetr-ranger

SHAP Importance for Ranger in R


Having a binary Classification problem: how would be possible to get the Shap Contribution for variables for a Ranger model?

Sample data:

library(ranger)
library(tidyverse)

# Binary Dataset
df <- iris
df$Target <- if_else(df$Species == "setosa",1,0)
df$Species <- NULL

# Train Ranger Model
model <- ranger(
  x = df %>%  select(-Target),
  y = df %>%  pull(Target))

I have tried with several libraries(DALEX, shapr, fastshap, shapper) but I didnt get any solution.

I wish getting some result like SHAPforxgboost for xgboost like:


Solution

  • Good Morning!, According to what I have found, you can use ranger() with fastshap() as following:

    library(fastshap)
    library(ranger)
    library(tidyverse)
    data(iris)
    # Binary Dataset
    df <- iris
    df$Target <- if_else(df$Species == "setosa",1,0)
    df$Species <- NULL
    x <- df %>%  select(-Target)
    # Train Ranger Model
    model <- ranger(
      x = df %>%  select(-Target),
      y = df %>%  pull(Target))
    # Prediction wrapper
    pfun <- function(object, newdata) {
      predict(object, data = newdata)$predictions
    }
    
    # Compute fast (approximate) Shapley values using 10 Monte Carlo repetitions
    system.time({  # estimate run time
      set.seed(5038)
      shap <- fastshap::explain(model, X = x, pred_wrapper = pfun, nsim = 10)
    })
    
    # Load required packages
    library(ggplot2)
    theme_set(theme_bw())
    
    # Aggregate Shapley values
    shap_imp <- data.frame(
      Variable = names(shap),
      Importance = apply(shap, MARGIN = 2, FUN = function(x) sum(abs(x)))
    )
    

    Then for example, for variable importance, you can do:

    # Plot Shap-based variable importance
    ggplot(shap_imp, aes(reorder(Variable, Importance), Importance)) +
      geom_col() +
      coord_flip() +
      xlab("") +
      ylab("mean(|Shapley value|)")
    

    enter image description here

    Also, if you want individual predictions, the following is possible:

    # Plot individual explanations
    expl <- fastshap::explain(model, X = x ,pred_wrapper = pfun, nsim = 10, newdata = x[1L, ])
    autoplot(expl, type = "contribution")
    

    All this information has been found in here, and there is more to it: https://bgreenwell.github.io/fastshap/articles/fastshap.html Check the link and solve your doubts ! :)

    enter image description here