rmachine-learningxgboostmlr

generatePartialDependenceData function returns Error when used for multiclass classification model


I have build an XGBoost multiclass classification model using mlr and i want to visualize the partial dependence for some features. However, if i try to do so using generatePartialDependenceData() i get the following error:

Error in melt.data.table(as.data.table(out), measure.vars = target, variable.name = if (td$type == : One or more values in 'measure.vars' is invalid.

I have checked for discrepancies between the task.desc in the Task object and the factor.levels in the WrappedModel object, but everything seems fine. Additionally, i have no trouble generating the data for a regression XGBoost with a different target variable using the same function. Is there a problem on my end, or is this a bug?

Here is an example using the palmerpenguins dataset:

# library
library(tidyverse)
library(caret)
library(mlr)

peng <- palmerpenguins::penguins

# data partition
set.seed(1234)
inTrain <- createDataPartition(
  y = peng$species,
  p = 0.7,
  list = F
)

# build task
train_class <- peng[inTrain,] %>% select(-sex, -year) %>% 
  createDummyFeatures(target = "species", cols = "island") %>% 
  makeClassifTask(data = ., target = "species")

# build learners
xgb_class_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "response"
)

# build model
xgb_class <- train(xgb_class_learner, train_class)

# generate partial dependence
generatePartialDependenceData(xgb_class, train_class)

Solution

  • As mentioned by KacZdr, setting the predict.type argument to "prob" works fine.

    # build learners
    xgb_class_learner <- makeLearner(
      "classif.xgboost",
      predict.type = "prob"
    )
    

    However, since Lars kotthoff mentioned that the mlr package is deprecated, here is an alternative code using mlr3 . There seems to be an issue with ggplot in the $plot() function for FeatureEffects objects, when i try using it i get:

    Error in `geom_rug()`:

    ! problem while computing position.

    i Error occured in the 2nd layer.

    Caused by error in `if (params$width > 0) ...`:

    ! Missing value, where TRUE/FALSE is required

    So i just generate the data and plot it myself.

    # library
    library(tidyverse)
    library(mlr3)
    library(mlr3learners)
    library(mlr3pipelines)
    library(iml)
    
    peng <- palmerpenguins::penguins
    
    # buil task
    tsk_peng <- peng %>% select(-sex, -year) %>% 
      as_task_classif(target = "species")
    
    # data partition
    splits <- partition(tsk_peng)
    
    # build learner
    lrn_classif <- as_learner(po("encode", method = "one-hot") %>>% lrn("classif.xgboost"))
    
    # train model
    lrn_classif$train(tsk_peng, row_ids = splits$train)
    
    # partail dependence
    predictor <- Predictor$new(
      lrn_classif, 
      data = tsk_peng$data(rows = splits$train, cols = tsk_peng$feature_names),
      y = tsk_peng$data(rows = splits$train, cols = tsk_peng$target_names)
      )
    
    effect <- FeatureEffects$new(predictor, method = "pdp")
    
    # plot
    ## continuous
    effect$results %>% 
      keep(names(.) %in% effect$features[1:4]) %>% 
      bind_rows() %>% 
      ggplot(aes(x = .borders, y = .value, col = .class))+
      geom_line()+
      facet_grid(~.feature, scale = "free")
    
    ## factor
    effect$results$island %>% 
      ggplot(aes(x = .borders, y = .value, fill = .class))+
      geom_bar(stat = "identity", position = "dodge")