rrandom-forestr-ranger

Repeated CV in TuneRanger


I am using the package "TuneRanger" to tune a RF model. It works good and I obtained good results but I am not sure if it is overfitting my model. I would like to use a Repeated CV for every instance the package is tuning the model but I can't find a way to do it. Also I would like to know if anybody knows how the package validates the results of every try (train-test, cv, repeated cv?) I have been reading the instructions of the package (https://cran.r-project.org/web/packages/tuneRanger/tuneRanger.pdf) but it says nothing about it.

Thank you for your help.


Solution

  • Out of bag estimates are used for estimating the error, I don't think you can switch to CV using that package. It's up to you to decide whether CV is better than this. In their readme, they linked to a publication, and under it section 3.5 they wrote:

    Out-of-bag predictions are used for evaluation, which makes it much faster than other packages that use evaluation strategies such as cross-validation

    If you want to use cross-validation or repeated cross-validation, you would have to use caret, for example:

    library(caret)
    
    mdl = train(Species ~ .,data=iris,method="ranger",trControl=trainControl(method="repeatedcv",repeats=2),
    tuneGrid = expand.grid(mtry=2:3,min.node.size = 1:2,splitrule="gini"))
    
    Random Forest 
    
    150 samples
      4 predictor
      3 classes: 'setosa', 'versicolor', 'virginica' 
    
    No pre-processing
    Resampling: Cross-Validated (10 fold, repeated 2 times) 
    Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
    Resampling results across tuning parameters:
    
      mtry  min.node.size  Accuracy  Kappa
      2     1              0.96      0.94 
      2     2              0.96      0.94 
      3     1              0.96      0.94 
      3     2              0.96      0.94 
    
    Tuning parameter 'splitrule' was held constant at a value of gini
    Accuracy was used to select the optimal model using the largest value.
    The final values used for the model were mtry = 2, splitrule = gini
     and min.node.size = 1.
    

    The parameters you can tune will be different. I think mlr also allows you to perform cross-validation but the same limitations apply.