
How can I get Variable Importance Plot for a categorical SVM in tidymodels

I would like to get VIP info on a SVM model that is doing classification. I found this useful post Variable importance plot for support vector machine with tidymodel framework is not working which shows how to get the plot for a regression model and I tried to tweak it.

Unfortunately it throws an error. Can anyone please tell me what I am doing wrong?

data(Boston, package = "MASS")

# Make a classificaiton outcome
df <- Boston |> 
  mutate(is_big = factor(if_else(medv > 22, 1, 0)))

# Split the data into train and test set
splits <- initial_split(df)
train <- training(splits)
test <- testing(splits)

# Preprocess with recipe
rec <- recipe(
  formula = is_big ~ .,
  data = train

svm_spec <- svm_rbf(margin = 0.0937, cost = 20, rbf_sigma = 0.0208) %>%
  set_engine("kernlab") %>%

# Putting into workflow
svr_fit <- workflow() %>%
  add_recipe(rec) %>%
  add_model(svm_spec) %>%
  fit(data = train)

svr_fit %>%
  extract_fit_parsnip() %>%
    method = "permute", nsim = 5,
    target = "is_big", metric = "roc_auc", event_level = "second",
    geom = "point",
    pred_wrapper = 
      function(object, newdata) as.vector(kernlab::predict(object, newdata)),
    train = train
#> Error in `fun()`:
#> ! `estimate` should be a numeric vector, not a character vector.

  • I think there's a bug in the vip() function for tidymodels workflows, you can work around it by calling vi() or vi_permute() directly or by calling vip() on the raw underlying model fit. Here's a working version using both approaches:

    data(Boston, package = "MASS")
    # Make a classificaiton outcome
    df <- Boston |> 
      mutate(is_big = factor(if_else(medv > 22, 1, 0)))
    # Split the data into train and test set
    splits <- initial_split(df)
    train <- training(splits)
    test <- testing(splits)
    # Preprocess with recipe
    rec <- recipe(
      formula = is_big ~ .,
      data = train
    svm_spec <- svm_rbf(margin = 0.0937, cost = 20, rbf_sigma = 0.0208) %>%
      set_engine("kernlab") %>%
    # Putting into workflow
    svr_fit <- workflow() %>%
      add_recipe(rec) %>%
      add_model(svm_spec) %>%
      fit(data = train)
    # Extract the raw underlying fit
    original_fit <- workflows::extract_fit_engine(svr_fit)
    # Prediction wrapper should return a vector of probabilities for the second class
    pfun <- function(object, newdata) {
      kernlab::predict(object, newdata, type = "probabilities")[, 2L]
    # Now this should work
    original_fit %>%
        method = "permute",
        nsim = 5,
        target = "is_big", metric = "roc_auc", event_level = "second",
        pred_wrapper = pfun,
        train = train
    # Alternatively, you can define a prediction wrapper for the workflow object 
    # directly; vip() seems to be bugged with tidymodels workflows
    svr_fit %>%
        method = "permute",
        nsim = 5,
        target = "is_big", metric = "roc_auc", event_level = "second",
        pred_wrapper = function(object, newdata) predict(object, newdata, type = "prob")[[".pred_1"]],
        train = train

    Careful though! Your example includes leakage since your binary outcome is a direct function of medv which is also included as an input; hence the large importance score for the latter.