The following code throws an error. Thank you in advance for your help.
require(mlr3)
require(mlr3proba)
require(mlr3learners)
require(mlr3tuning)
require(mlr3pipelines)
require(mlr3verse)
require(mlr3viz)
#- require(mlr3fda)require(mlr3verse)
require(survival)
require(glmnet)
require(splines)
Simulate a regression dataset
set.seed(123)
n <- 100
p <- 3
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
time <- rexp(n, rate = 1)
status <- sample(0:1, n, replace = TRUE)
df <- as.data.frame(X)
df$time <- time
df$status <- status
Create a survival task
task <- TaskSurv$new("survival_task", backend = df, time = "time", event ="status")
task
#---- Perform initial split
initial_split <- rsmp("holdout")
initial_split$instantiate(task)
Separate the data into training and testing sets
train_task <- task$clone()$filter(initial_split$train_set(1))
test_task <- task$clone()$filter(initial_split$test_set(1)
Load the glmnet learner
learner <- lrn("surv.glmnet")
#---- Define the hyperparameter search space
search_space <- ps( alpha = p_dbl(lower = 0, upper = 1),
lambda = p_dbl(lower = 0.0001, upper = 0.1, logscale = TRUE)
)
#---- Define objects needed for tuning
#---- Create a Pipeline Using for Splines Transformation
#- library(paradox)
#- Define a function to apply splines transformation
apply_splines <- function(x) {
as.data.table(splines::ns(x, df = 3))
}
#- Define the pipeline graph for applying splines transformation
graph <- gunion(list(
po("colapply", id = "spline_V1", applicator = apply_splines,
affect_columns = selector_name("V1")),
po("colapply", id = "spline_V2", applicator = apply_splines,
affect_columns = selector_name("V2")),
po("colapply", id = "spline_V3", applicator = apply_splines,
affect_columns = selector_name("V3")) )) %>>%
po("featureunion") %>>%
learner
#-- Create the pipeline learner
pipeline <- GraphLearner$new(graph)
#--- Define the resampling strategy for tuning
resampling <- rsmp("cv", folds = 5)
# Define the performance measure for survival analysis
measure <- msr("surv.cindex")
# Create the tuner
tuner <- tnr("grid_search", resolution = 5)
#-- Define the AutoTuner
at <- AutoTuner$new(
learner = pipeline,
resampling = resampling,
measure = measure,
search_space = search_space,
terminator = trm("evals", n_evals = 20),
tuner = tuner
)
# Train the AutoTuner on the training set
at$train(train_task)
... part of the output is omitted
INFO [18:08:27.654] [mlr3] Finished benchmark
INFO [18:08:27.692] [bbotk] Result of batch 20:
INFO [18:08:27.694] [bbotk] alpha lambda surv.cindex warnings errors runtime_learners
INFO [18:08:27.694] [bbotk] 0.25 -7.483402 0.4561424 0 0 1.52
INFO [18:08:27.694] [bbotk] uhash
INFO [18:08:27.694] [bbotk] a491c12c-47e5-448b-b365-34aa53350e01
INFO [18:08:27.711] [bbotk] Finished optimizing after 20 evaluation(s)
INFO [18:08:27.712] [bbotk] Result:
INFO [18:08:27.714] [bbotk] alpha lambda learner_param_vals x_domain surv.cindex
INFO [18:08:27.714] [bbotk] <num> <num> <list> <list> <num>
INFO [18:08:27.714] [bbotk] 0.75 -4.029524 <list[8]> <list[2]> 0.4561424
Error in self$assert(xs, sanitize = TRUE) :
Assertion on 'xs' failed: Parameter 'alpha' not available. Did you mean 'spline_V1.applicator' / 'spline_V1.affect_columns' / 'spline_V2.applicator'?.
You define a GraphLearner
that inside somewhere has a learner
. When you define the Autotuner
you provide the search_space
of the learner
not of the learner
inside the larger GraphLearner
.
The difference is that for the learner
, the parameters that need tuning are defined as alpha
and lamdba
. Inside the GraphLearner
they are defined as surv.glmnet.alpha
and surv.glmnet.lambda
. This triggers warnings as many lambda
s are actually fitted (pretty much the search_space
is not used at all in your case I think). You can see that if in your Autotuner
you just used the learner
, then things would work normally.
This is more general: the GraphLearner
constructs <pipeop_id>.<arg_name>
to be able to differentiate between parameters of the different pipeops.
search_space
with the learner (and when the GraphLearner
gets constructed, the prefix of the parameters is automatically added)learner = lrn("surv.glmnet")
learner$param_set$set_values(.values = list(
alpha = to_tune(0, 1),
lambda = to_tune(p_dbl(0.001, 0.1, logscale = TRUE))
))
Note that in this case you DO NOT need to use the search_space
argument in AutoTuner
.
id = surv.glmnet
of the learner
, ie:search_space = ps(
surv.glmnet.alpha = p_dbl(lower = 0, upper = 1),
surv.glmnet.lambda = p_dbl(lower = 0.0001, upper = 0.1, logscale = TRUE)
)
colapply
as the same operation is applied to all columns, see examples# simple train/test split
part = partition(task)
at$train(task, row_ids = part$train)
at = auto_tuner(
learner = pipeline, # better name => grlrn, it has the `learner` inside with "solution No 1" above
resampling = resampling,
measure = measure,
tuner = tuner,
term_evals = 20
)