rpartyctree

Changing terminal node in partykit


For illustration purposes, I try to modify the terminal node of a ctree in partykit.

I created some data and fitted an unpruned decision tree. Unsurprisingly the tree grows very large, and I have problems illustrating the tree properly.

Here is the code, where I created some data and then fitted the tree:

library(partykit)

# -------------------------------------------------------------------------
# a function that creates some data for me

dgp_math_s <- function(ni,nj, RI_sd, sigma2 = 1,
                gamma00 = 0, gamma01 = 0, gamma10 = 0, gamma02 = 0, gamma20 = 0){
  
  dgp_grid <- expand.grid(
    ni = 1:ni,
    nj = 1:nj,
    gamma02 = NA,
    gamma20 = NA,
    studying = NA,
    atmosphere = NA,
    math_score = NA, # with two predictor variables
    Rij = NA,
    U0j = NA
  )
  
  dgp_grid$atmosphere <- rep(rbinom(nj,1,0.5), each = length(1:ni))
  #create a random factorial level 2 predictor z1j, same value for the whole cluster 
  
  dgp_grid$U0j <- rep(rnorm(nj, mean = 3, sd = RI_sd), each = ni)
  #create level 2 residual 
  
  dgp_grid$Rij <- rnorm(ni*nj, mean = 3, sd = sqrt(sigma2))
  # create level 1 residual with sigma2 = 1
  
  dgp_grid$studying <-sample(0:5, ni*nj, replace = TRUE)
  # create level 1 x1ij explanatory/predictor variable (draw from standard normal) 
  
  dgp_grid$math_score <-
    gamma00 + gamma10 * dgp_grid$studying + gamma01 * dgp_grid$atmosphere +
    dgp_grid$U0j + dgp_grid$Rij
  #creating yij with only two predictor without any effect

  return(dgp_grid)
}
# -------------------------------------------------------------------------
#fitting the tree

dgp_math<-dgp_math_s(ni = 20, nj = 20, RI_sd = 2, gamma10 = 0, gamma01 = 0)
#create data 

diab_model <- partykit::ctree(math_score~ studying + atmosphere, data=dgp_math, control = ctree_control(mincriterion=0.005, minsplit=0, minbucket=0))
#fit unpruned tree


plot(diab_model, gp = gpar(fontsize = 7))
#ploting the tree

When plotting the tree now it looks something like this: enter image description here

As you can see, the text over the nodes is not entirely visible. I tried to change the font size with gp = gpar(fontsize = 7), but it was not enough. I also tried to change the terminal node with something like terminal_panel = node_boxplot(id = FALSE) but that didnt work either.

Any ideas on how I can change the text of the terminal node so that it does not show the n = xx part?

Or any other ideas on how I can improve the terminal node so that I can plot it nicely?

Thanks!


Solution

  • The node_*() panel-generating functions need the fitted tree object as their first argument. Thus, to set id = FALSE you would have to do:

    plot(diab_model, terminal_panel = node_boxplot(diab_model, id = FALSE))
    

    In order not to repeat the object name (here: diab_model) one can also use the terminal panel arguments tp_args which is particularly convenient when you use the default panel function anyway (here: node_boxplot()). Thus, the following is equivalent to the call above:

    plot(diab_model, tp_args = list(id = FALSE))
    

    For the most flexible formatting of the main label you can use the mainlab argument. It can be a function(id, nobs) and then you can decide if/how these are shown. A compact display would be "id: n=..." which I set up below. Additionally I reduce the amount of space between the terminal panels:

    mylab <- function(id, nobs) sprintf("%s: n=%s", id, nobs)
    plot(diab_model, tp_args = list(mainlab = mylab, ylines = 1.5))
    

    ctree plot with custom mainlab and reduced ylines

    Instead of setting the font size, I have simply plotted the display onto a large device of 15 x 8 inch.

    The following discussion provides an example how you can further customize the mainlab: partykit: Displaying terminal node percentile values above terminal node boxplots