rdataframepartyctree

How to plot a conditional inference tree on random dataset?


I need to plot a conditional inference tree. I have selected the party::ctree() function. It works on the iris dataset.

library(party)
(irisct_party <- party::ctree(Species ~ .,data = iris))
plot(irisct_party)

enter image description here

But when I using the random data

library(wakefield)
set.seed(123)
n=200
studs <- data.frame(problem = factor(answer(n, x = c("No", "Yes"))),
                    age     = round(runif(n, 18, 25)),
                    gender  = factor(answer(n, x = c("M",   "F" ))),
                    smoker  = factor(answer(n, x = c("No",  "Yes" ))),
                    before  = round(runif(n, 60, 80)),
                    after   = before + round(runif(n, 10, 20))
)

(ct <-  party::ctree(problem ~ ., data = studs))
plot(ct)

I see just

Conditional inference tree with 1 terminal nodes

Response:  problem 
Inputs:  age, gender, smoker, before, after 
Number of observations:  200 

1)*  weights = 200 

Question. Why is the conditional inference tree has 1 terminal node on random data?


Solution

  • In each node (including the root node), ctree() conducts an independence test for the dependent variable (problem in your random data) and each of the explanatory variables (age, gender, smoker, before, after). It computes the p-value for each of of the tests and selects the explanatory variable with the lowest p-value for splitting. But only if that p-value is significant at a certain significance level (adjusted for testing multiple explanatory variables). In your data this is not the case because, in fact, the dependent variable has been sampled independently from the explanatory ones. Therefore, the algorithm stops and does not split the root node.

    Remarks: It is recommended to use the successor package partykit rather than party for fitting ctree(). See also the accompanying vignette("ctree", package = "partykit") for further details.