rr-caret

How does caret choose default tuning range?


When using R caret to compare multiple models on the same data set, caret is smart enough to select different tuning ranges for different models if the same tuneLength is specified for all models and no model-specific tuneGrid is specified.

For example, the tuning ranges chosen by caret for one particular data set are:

earth(nprune): 2, 5, 8, 11, 14

gamSpline(df): 1, 1.5, 2, 2.5, 3

rpart(cp): 0.010, 0.054, 0.116, 0.123, 0.358

How does caret determine these default tuning ranges? I have been searching through the documentation but still haven't pinned down the algorithm to choose the ranges.


Solution

  • It depends on the model. For rpart and a few others, it fits and initial model to get a sense of what reasonable values should be. In other cases, it is less intelligent. For example, for gamSpline it is expand.grid(df = seq(1, 3, length = len)).

    You can see what it does per model using getModelInfo:

     > getModelInfo("earth")[[1]]$grid
     function(x, y, len = NULL) {
           dat <- if(is.data.frame(x)) x else as.data.frame(x)
           dat$.outcome <- y
    
           mod <- earth( .outcome~., data = dat, pmethod = "none")
           maxTerms <- nrow(mod$dirs)
           maxTerms <- min(200, floor(maxTerms * .75) + 2)
           data.frame(nprune = unique(floor(seq(2, to = maxTerms, length = len))),
                      degree = 1)
      }
    

    Max