For illustration purposes, I try to modify the terminal node of a ctree in partykit.
I created some data and fitted an unpruned decision tree. Unsurprisingly the tree grows very large, and I have problems illustrating the tree properly.
Here is the code, where I created some data and then fitted the tree:
library(partykit)
# -------------------------------------------------------------------------
# a function that creates some data for me
dgp_math_s <- function(ni,nj, RI_sd, sigma2 = 1,
gamma00 = 0, gamma01 = 0, gamma10 = 0, gamma02 = 0, gamma20 = 0){
dgp_grid <- expand.grid(
ni = 1:ni,
nj = 1:nj,
gamma02 = NA,
gamma20 = NA,
studying = NA,
atmosphere = NA,
math_score = NA, # with two predictor variables
Rij = NA,
U0j = NA
)
dgp_grid$atmosphere <- rep(rbinom(nj,1,0.5), each = length(1:ni))
#create a random factorial level 2 predictor z1j, same value for the whole cluster
dgp_grid$U0j <- rep(rnorm(nj, mean = 3, sd = RI_sd), each = ni)
#create level 2 residual
dgp_grid$Rij <- rnorm(ni*nj, mean = 3, sd = sqrt(sigma2))
# create level 1 residual with sigma2 = 1
dgp_grid$studying <-sample(0:5, ni*nj, replace = TRUE)
# create level 1 x1ij explanatory/predictor variable (draw from standard normal)
dgp_grid$math_score <-
gamma00 + gamma10 * dgp_grid$studying + gamma01 * dgp_grid$atmosphere +
dgp_grid$U0j + dgp_grid$Rij
#creating yij with only two predictor without any effect
return(dgp_grid)
}
# -------------------------------------------------------------------------
#fitting the tree
dgp_math<-dgp_math_s(ni = 20, nj = 20, RI_sd = 2, gamma10 = 0, gamma01 = 0)
#create data
diab_model <- partykit::ctree(math_score~ studying + atmosphere, data=dgp_math, control = ctree_control(mincriterion=0.005, minsplit=0, minbucket=0))
#fit unpruned tree
plot(diab_model, gp = gpar(fontsize = 7))
#ploting the tree
When plotting the tree now it looks something like this:
As you can see, the text over the nodes is not entirely visible.
I tried to change the font size with gp = gpar(fontsize = 7)
, but it was not enough. I also tried to change the terminal node with something like terminal_panel = node_boxplot(id = FALSE)
but that didnt work either.
Any ideas on how I can change the text of the terminal node so that it does not show the n = xx part?
Or any other ideas on how I can improve the terminal node so that I can plot it nicely?
Thanks!
The node_*()
panel-generating functions need the fitted tree object as their first argument. Thus, to set id = FALSE
you would have to do:
plot(diab_model, terminal_panel = node_boxplot(diab_model, id = FALSE))
In order not to repeat the object name (here: diab_model
) one can also use the terminal panel arguments tp_args
which is particularly convenient when you use the default panel function anyway (here: node_boxplot()
). Thus, the following is equivalent to the call above:
plot(diab_model, tp_args = list(id = FALSE))
For the most flexible formatting of the main label you can use the mainlab
argument. It can be a function(id, nobs)
and then you can decide if/how these are shown. A compact display would be "id: n=..." which I set up below. Additionally I reduce the amount of space between the terminal panels:
mylab <- function(id, nobs) sprintf("%s: n=%s", id, nobs)
plot(diab_model, tp_args = list(mainlab = mylab, ylines = 1.5))
Instead of setting the font size, I have simply plotted the display onto a large device of 15 x 8 inch.
The following discussion provides an example how you can further customize the mainlab
: partykit: Displaying terminal node percentile values above terminal node boxplots