I want to extract the standard deviation and/or an IC95 of the result obtain in a benchmark of multiple learners on a task in order to ensure that the results are complete. I read this : mlr3 standard deviation for k-fold cross-validation resampling
But I don't know if since February there was something new...
Here is my code and the plots of the bmr :
resampling_outer = rsmp("cv", folds = 5)
resampling_inner = rsmp("cv", folds = 3)
set.seed(372)
resampling_outer$instantiate(task_wilcox)
resampling_inner$instantiate(task_wilcox)
at_xgboost = auto_tuner(tuner=tnr("mbo"), learner = xgboost,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_ranger = auto_tuner(tuner=tnr("mbo"), learner = ranger,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_svm = auto_tuner(tuner=tnr("mbo"), learner = svm,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_knn = auto_tuner(tuner=tnr("mbo"), learner = knn,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
learners <- c(at_xgboost, at_svm, at_ranger, at_knn)
measures = msrs(c("classif.auc", "classif.bacc", "classif.bbrier"))
#Benchmarking
set.seed(372)
design = benchmark_grid(tasks = task_wilcox, learners = learners, resamplings = resampling_outer)
bmr = benchmark(design, store_models = TRUE)
results <- bmr$aggregate(measures)
print(results)
autoplot(bmr, measure = msr("classif.auc"))
autoplot(bmr, type = "roc")
results
nr task_id learner_id resampling_id iters classif.auc classif.bacc classif.bbrier
1: 1 data_wilcox scale.xgboost.tuned cv 5 0.6112939 0.5767294 0.2326787
2: 2 data_wilcox scale.svm.tuned cv 5 0.5226407 0.5010260 0.1893202
3: 3 data_wilcox scale.random_forest.tuned cv 5 0.6200084 0.5614843 0.2229120
4: 4 data_wilcox scale.knn.tuned cv 5 0.5731675 0.5002955 0.1917721
extract_inner_tuning_results(bmr)[,list(learner_id, classif.auc)]
learner_id classif.auc
1: scale.xgboost.tuned 0.6231350
2: scale.xgboost.tuned 0.6207103
3: scale.xgboost.tuned 0.6175323
4: scale.xgboost.tuned 0.6195693
5: scale.xgboost.tuned 0.6222398
6: scale.svm.tuned 0.5891432
7: scale.svm.tuned 0.5837583
8: scale.svm.tuned 0.5767444
9: scale.svm.tuned 0.6027165
10: scale.svm.tuned 0.6082825
11: scale.random_forest.tuned 0.6287649
12: scale.random_forest.tuned 0.6165179
13: scale.random_forest.tuned 0.6288599
14: scale.random_forest.tuned 0.6259322
15: scale.random_forest.tuned 0.6234295
16: scale.knn.tuned 0.5931790
17: scale.knn.tuned 0.5926835
18: scale.knn.tuned 0.5931790
19: scale.knn.tuned 0.5929156
20: scale.knn.tuned 0.5929156
As you can see on the ROC curve, there are standard deviations or IC with the colored and translucide margin, but how to extract it ? I suppose that for the standard deviation, I have to extract the results on all the outer resampling of my nested CV, but there is no mean to extract it directly (it appears on the ROC curve via the margin, I suppose it exists somewhere...).
Second question, on my plot of the box plots of the AUC of each learner, I don't really know how it builds the box plots since it doesn't correspond to the results on the test set (outer loop => resampling outer)...
Last question : Do you know how to personalize the roc curves in mlr3 ? If I want to add the AUC on the scheme, or delete the margin around curve for example...
Thanks !
predict_type = "se"
, see https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-predicting. This is computed by the learner internally and not over folds of a cross-validation, but most likely what you want.mlr3
return ggplot2
objects; you can customize them in the same way as other plots.