I'm fitting two models with the ranger
package and the same seed. The first one predicts the class and the second one returns the probability matrix, my goal is to reach the same result, but I differ in 4 registers. Someone knows the solution. I'm using the maximum probability per class. What should be the cut point?
library(ranger)
library(caret)
## fit model 1
mod <- ranger(formula = Species ~., data = iris, seed = 2020)
res1 <- predict(object = mod, data = iris[,-5])$predictions
## fit model 2
mod2 <- ranger(formula = Species ~., data = iris, probability = TRUE, seed = 2020)
res2 <- factor(ifelse(apply(predict(object = mod2, data = iris[,-5])$predictions, 1, which.max) == 1,"setosa",
ifelse(apply(predict(object = mod2, data = iris[,-5])$predictions, 1, which.max) == 2, "versicolor", "virginica")),
levels = c("setosa","versicolor","virginica"))
head(data.frame(res1, res2))
res1 res2
1 setosa setosa
2 setosa setosa
3 setosa setosa
4 setosa setosa
5 setosa setosa
6 setosa setosa
all.equal(res1, res2)
[1] "4 string mismatches"
My expected output
all.equal(res1, res2)
[1] TRUE
Very interesting question: I am a user of ranger
and was not aware of this result.
As stated by @MrFlick in the comment to your answer, you are using two different methods. You can confirm it accessing to the element treetype
of mod
and mod2
:
mod$treetype
"Classification"
mod2$treetype
"Probability estimation"