For a node x in partykit::ctree
object, I use the following lines to get the splitting variables on the node:
k=info_node(x)
names(k$p.value)
However, a splitting variables of a node returned by this code is different from the one on the tree created by plot
. It turns out that three columns in k$criterion
have the minimum p-value; i.e.
inds=which(k$criterion['p.value',]==k$p.value)
length(inds) #3
Seems the info_node(x)
returns the 1st of the three variables as names(k$p.value)
, but plot
chooses the 3rd one. I wonder if such discrepancy is caused by two reasons:
Multiple variables have the minimum p-value, and there is an internal method to break such a tie in selecting only one splitting variable.
Maybe these three variable have slightly different p-value, but because of the fixed p-value precision in k$criterion, they appear to have the same p-value.
Any insight is appreciated!
The comparisons are done internally on the log-p-value scale, i.e., are more reliable in case of tiny p-values. If ties (within machine precision) still remain for the p-value, they are broken based on the size of the corresponding test statistic.