I am using the package "TuneRanger" to tune a RF model. It works good and I obtained good results but I am not sure if it is overfitting my model. I would like to use a Repeated CV for every instance the package is tuning the model but I can't find a way to do it. Also I would like to know if anybody knows how the package validates the results of every try (train-test, cv, repeated cv?) I have been reading the instructions of the package (https://cran.r-project.org/web/packages/tuneRanger/tuneRanger.pdf) but it says nothing about it.
Thank you for your help.
Out of bag estimates are used for estimating the error, I don't think you can switch to CV using that package. It's up to you to decide whether CV is better than this. In their readme, they linked to a publication, and under it section 3.5 they wrote:
Out-of-bag predictions are used for evaluation, which makes it much faster than other packages that use evaluation strategies such as cross-validation
If you want to use cross-validation or repeated cross-validation, you would have to use caret
, for example:
library(caret)
mdl = train(Species ~ .,data=iris,method="ranger",trControl=trainControl(method="repeatedcv",repeats=2),
tuneGrid = expand.grid(mtry=2:3,min.node.size = 1:2,splitrule="gini"))
Random Forest
150 samples
4 predictor
3 classes: 'setosa', 'versicolor', 'virginica'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 2 times)
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ...
Resampling results across tuning parameters:
mtry min.node.size Accuracy Kappa
2 1 0.96 0.94
2 2 0.96 0.94
3 1 0.96 0.94
3 2 0.96 0.94
Tuning parameter 'splitrule' was held constant at a value of gini
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini
and min.node.size = 1.
The parameters you can tune will be different. I think mlr
also allows you to perform cross-validation but the same limitations apply.