rrandom-forestr-caretparty

mtry value depends on tuneGrid range even with the same seed


I am trying to find the optimal mtry value for conditional random forest. I do it with the help of caret::train function and found that depending on the grid range I got different optimal mtry even with the same seed. Which value shall I pick then? Example:

  1. with the below code I got mtry=8
set.seed(12)
data<-mtcars%>%
     mutate(target=as.factor(vs))
grid <- expand.grid(.mtry=2:12)
mod <- train(target ~ ., data = data, method = "cforest", controls = cforest_unbiased(ntree = 500), tuneGrid=grid)
mod$bestTune
  1. while with the grid 1:12 I got mtry=9
set.seed(12)
data<-mtcars%>%
     mutate(target=as.factor(vs))
grid <- expand.grid(.mtry=1:12)
mod <- train(target ~ ., data = data, method = "cforest", controls = cforest_unbiased(ntree = 500), tuneGrid=grid)
mod$bestTune

Solution

  • If the grid differs, then in general the results will differ, even for the same seed. It will depend, however, on how the train() function goes through the grid. Possibly, if you just change the end (rather than the beginning) of the grid, then the results up to that point will be the same.

    If you want to split up computations, then you could use separate grids and set separate seeds before training on each one of them. Then you can easily add or drop parts. But, of course, you will have to select the best result manually over the different grids.

    Getting an mtry = 8 vs mtry = 9 does not seem to be a huge difference anyway. I wouldn't be surprised if this is within the range of random variation here. But I couldn't replicate your results because I just got warnings from the training (possibly because vs is identical to target but is used as one of the regressors).