I have build an XGBoost multiclass classification model using mlr and i want to visualize the partial dependence for some features. However, if i try to do so using generatePartialDependenceData()
i get the following error:
Error in melt.data.table(as.data.table(out), measure.vars = target, variable.name = if (td$type == : One or more values in 'measure.vars' is invalid.
I have checked for discrepancies between the task.desc
in the Task
object and the factor.levels
in the WrappedModel
object, but everything seems fine. Additionally, i have no trouble generating the data for a regression XGBoost with a different target variable using the same function.
Is there a problem on my end, or is this a bug?
Here is an example using the palmerpenguins
dataset:
# library
library(tidyverse)
library(caret)
library(mlr)
peng <- palmerpenguins::penguins
# data partition
set.seed(1234)
inTrain <- createDataPartition(
y = peng$species,
p = 0.7,
list = F
)
# build task
train_class <- peng[inTrain,] %>% select(-sex, -year) %>%
createDummyFeatures(target = "species", cols = "island") %>%
makeClassifTask(data = ., target = "species")
# build learners
xgb_class_learner <- makeLearner(
"classif.xgboost",
predict.type = "response"
)
# build model
xgb_class <- train(xgb_class_learner, train_class)
# generate partial dependence
generatePartialDependenceData(xgb_class, train_class)
As mentioned by KacZdr, setting the predict.type
argument to "prob"
works fine.
# build learners
xgb_class_learner <- makeLearner(
"classif.xgboost",
predict.type = "prob"
)
However, since Lars kotthoff mentioned that the mlr
package is deprecated, here is an alternative code using mlr3
. There seems to be an issue with ggplot in the $plot()
function for FeatureEffects
objects, when i try using it i get:
Error in `geom_rug()`:
! problem while computing position.
i Error occured in the 2nd layer.
Caused by error in `if (params$width > 0) ...`:
! Missing value, where TRUE/FALSE is required
So i just generate the data and plot it myself.
# library
library(tidyverse)
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(iml)
peng <- palmerpenguins::penguins
# buil task
tsk_peng <- peng %>% select(-sex, -year) %>%
as_task_classif(target = "species")
# data partition
splits <- partition(tsk_peng)
# build learner
lrn_classif <- as_learner(po("encode", method = "one-hot") %>>% lrn("classif.xgboost"))
# train model
lrn_classif$train(tsk_peng, row_ids = splits$train)
# partail dependence
predictor <- Predictor$new(
lrn_classif,
data = tsk_peng$data(rows = splits$train, cols = tsk_peng$feature_names),
y = tsk_peng$data(rows = splits$train, cols = tsk_peng$target_names)
)
effect <- FeatureEffects$new(predictor, method = "pdp")
# plot
## continuous
effect$results %>%
keep(names(.) %in% effect$features[1:4]) %>%
bind_rows() %>%
ggplot(aes(x = .borders, y = .value, col = .class))+
geom_line()+
facet_grid(~.feature, scale = "free")
## factor
effect$results$island %>%
ggplot(aes(x = .borders, y = .value, fill = .class))+
geom_bar(stat = "identity", position = "dodge")