rr-ranger

how to get the same prediction (probability and class) in a random forest


I'm fitting two models with the ranger package and the same seed. The first one predicts the class and the second one returns the probability matrix, my goal is to reach the same result, but I differ in 4 registers. Someone knows the solution. I'm using the maximum probability per class. What should be the cut point?

library(ranger)
library(caret)

## fit model 1
mod <- ranger(formula = Species ~., data = iris, seed = 2020)
res1 <- predict(object = mod, data = iris[,-5])$predictions

## fit model 2
mod2 <- ranger(formula = Species ~., data = iris, probability = TRUE, seed = 2020)
res2 <- factor(ifelse(apply(predict(object = mod2, data = iris[,-5])$predictions, 1, which.max) == 1,"setosa",
       ifelse(apply(predict(object = mod2, data = iris[,-5])$predictions, 1, which.max) == 2, "versicolor", "virginica")),
       levels = c("setosa","versicolor","virginica"))

head(data.frame(res1, res2))
    res1   res2
1 setosa setosa
2 setosa setosa
3 setosa setosa
4 setosa setosa
5 setosa setosa
6 setosa setosa

all.equal(res1, res2)
[1] "4 string mismatches"

My expected output

all.equal(res1, res2)
[1] TRUE

Solution

  • Very interesting question: I am a user of ranger and was not aware of this result.

    As stated by @MrFlick in the comment to your answer, you are using two different methods. You can confirm it accessing to the element treetype of mod and mod2:

    mod$treetype
    "Classification"
    
    mod2$treetype
    "Probability estimation"