I am following this example and I want to change one part of the code from:
# default RF model
m1 <- randomForest(
formula = Sale_Price ~ .,
data = ames_train
)
# number of trees with lowest MSE
btree <- which.min(m1$mse)
to it's equivalent ranger
-based code. The issue is that ranger
doesn't provide access directly to number of trees with the lowest MSE. How can I calculate the and store in a variable (I call this var btree) the number of trees with the lowest MSE?
library(rsample) # data splitting
library(randomForest) # basic implementation
library(ranger) # a faster implementation of randomForest
set.seed(123)
ames_split <- initial_split(AmesHousing::make_ames(), prop = .7)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
# for reproduciblity
set.seed(123)
# default RF model
m1 <- randomForest(
formula = Sale_Price ~ .,
data = ames_train
)
# the equivalent in ranger
m1 <- ranger(
formula = Sale_Price ~ .,
data = ames_train
)
# number of trees with lowest MSE (randomForest package)
btree <- which.min(m1$mse)
Based on the ranger
documentation:
prediction.error: Overall out-of-bag prediction error. For classification this is accuracy (proportion of misclassified observations), for probability estimation the Brier score, for regression the mean squared error and for survival one minus Harrell's C-index.
So if I do:
m1 <- ranger(
formula = Sale_Price ~ .,
data = ames_train
)
# number of trees with highest r2
btree = which.max(m1$prediction.error)
print(btree)
The result is:
[1] 1
which obviously is not right.
I don't think there is a way to get this directly from the ranger
outputs. But you could run predictions for each tree and calculate it yourself. For example:
m1 <- ranger(
formula = Sale_Price ~ .,
data = ames_train,
keep.inbag = TRUE,
write.forest = TRUE
)
num_trees <- m1$num.trees
predictions <- matrix(nrow = num_trees, ncol = nrow(ames_train))
mse <- numeric(num_trees)
for(i in 1:num_trees){
pred <- predict(m1,
data = ames_train,
num.trees = i)$predictions
mse[i] <- mean((pred - ames_train$Sale_Price)^2)
}
btree <- which.min(mse)