I trained a random forest with party::cforest with n_trees for a regression (continuous response). When using "predict(type="response") what one get is only the mean of all n_trees responses. How do I get the response of each individual tree (that is, n_trees responses) ? Thank you very much! I've been trying for weeks and I'm still clueless!
I also tried training the forest with partykit, but still I cannot find a way of getting all responses. In the documentation there is an example with a quantile function. I tried getting the median of all responses (If I can't get all answers explicitly, at least I thought I could get some stats from it), with function(y, w) median(y), but that gives me the same value for all datapoints. So I didn't really understand how the FUN should work in the partykit::predict
I also tried predict(type="prob"), as suggested in other posts for classification randomforests, but with that I got an error "cannot compute empirical distribution function with non-integer weights".
So I remain clueless. Thank you for any help!
The ntree
individual predictions are actually not computed within cforest()
. Instead the predictions of the forest are computed as weighted means of the original responses, where the weights depend on the new data points.
However, you can set up the ntree
individual trees and compute the predictions yourself. All the necessary information is in the cforest
object.
Let's consider the following simple example for the cars
data using a forest with only 10 trees:
library("partykit")
set.seed(1)
cf <- cforest(dist ~ speed, data = cars, ntree = 10)
Then you can obtain the predictions for two new data points:
nd <- data.frame(speed = c(10, 20))
predict(cf, newdata = nd)
## 1 2
## 22.65411 63.11666
Now to replicate this we can also set up the 10 individual trees from the forest. For this we use the constparty
class as also returned by ctree()
:
ct <- lapply(seq_along(cf$nodes), function(i) as.constparty(
party(cf$nodes[[i]], data = cf$data, terms = cf$terms,
fitted = data.frame(
`(response)` = cf$fitted[["(response)"]],
`(weights)` = cf$weights[[i]],
check.names = FALSE))
))
To the list of 10 constparty
trees you can then apply the predict()
method to obtain the 10 individual predictions and compute their mean:
p <- sapply(ct, predict, newdata = nd)
dim(p)
## [1] 2 10
rowMeans(p)
## 1 2
## 22.65411 63.11666
But now you can also inspect the full 2 x 10 matrix p
with the predictions from all individual trees.