I am trying to find a way of ensure my mlr3pipeline is working as expected. I have a classbalancing
pipeline and am trying to view properties of the data being given to my model, and split for testing/training. I suspect a much larger portion than what I want is being given for training/testing during each iteration.
graph_lr <- po('classbalancing',
adjust = 'downsample',
reference = 'minor',
ratio = 5) %>%
po("encode", method = 'treatment') %>%
po("scale") %>%
lrn("classif.cv_glmnet",
predict_type = 'prob',
type.measure = 'auc',
predict_sets = c("train", "test"))
graphLearner_lr <- GraphLearner$new(graph_lr)
I am intended to downsample my major class (binary problem) to a ratio of 5 X the minor class. It is then resampled.
lr_resample <- mlr3::resample(task = task_lr,
graphLearner_lr,
outerResamp,
store_models = TRUE,
store_backends = TRUE)
How can I view properties of the downsampled data (such as nrows, row indexes etc)? I have tried looking in the individual learners and elsewhere in the ResampleResult, but have been unable to find anything
You can use the $keep_results
flag of Graph
to store the intermediate tasks. The $data()
method returns the data.
library(mlr3verse)
library(mlr3learners)
task = tsk("spam")
graph = po("classbalancing", adjust = "downsample", reference = "minor", ratio = 5) %>>%
po("encode", method = "treatment") %>>%
po("scale") %>>%
lrn("classif.cv_glmnet", predict_type = "prob", type.measure = "auc", predict_sets = c("train", "test"))
graph$keep_results = TRUE
graph_learner = as_learner(graph)
rr = resample(task, graph_learner, rsmp("cv", folds = 3), store_models = TRUE, store_backends = TRUE)
trained_learner_1 = rr$learners[[1]]
# Task of iteration 1 after class balancing
trained_learner_1$graph$pipeops$classbalancing$.result$output
# Task of iteration 1 after class balancing and encoding
trained_learner_1$graph$pipeops$encode$.result$output
# Task of iteration 1 after class balancing, encoding and scaling
trained_learner_1$graph$pipeops$scale$.result$output