rplottreerpart

Prp plot - Coloring positive and negative values differently


I am fitting regression trees via the function rpart(). Given my data, I am going to have both positive and negative estimates in nodes. Is there a way to color them differently?

In particular, what I would like to have is a tree whose nodes are shaded in blue for negative values and in red for positive values, where darker colors signal stronger absolute values.

I attach a minimal reproducible example.

library(rpart)
library(rpart.plot)

# Simulating data.
set.seed(1986)

X = matrix(rnorm(2000, 0, 1), nrow = 1000, ncol = 2) 
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)

y = X[, 1] + X[, 2] + epsilon

dta = data.frame(X, y)

# Fitting regression tree.
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3) 

# Plotting.
prp(my.tree,
    type = 2,
    clip.right.labs = FALSE,
    extra = 101,
    under = FALSE,
    under.cex = 1,
    fallen.leaves = TRUE,
    box.palette = "BuRd",
    branch = 1,
    round = 0,
    leaf.round = 0,
    prefix = "" ,
    main = "",
    cex.main = 1.5,
    branch.col = "gray",
    branch.lwd = 3)

# Repeating, with median(y) != 0.
X = matrix(rnorm(2000, 5, 1), nrow = 1000, ncol = 2) 
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)

my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3) 

# HERE I NEED HELP!
prp(my.tree,
    type = 2,
    clip.right.labs = FALSE,
    extra = 101,
    under = FALSE,
    under.cex = 1,
    fallen.leaves = TRUE,
    box.palette = "BuRd",
    branch = 1,
    round = 0,
    leaf.round = 0,
    prefix = "" ,
    main = "",
    cex.main = 1.5,
    branch.col = "gray",
    branch.lwd = 3)

As far as I understood, thanks to the box.palette option, I obtained the result I need in the first setting because median(y) is close to zero.

Indeed, in the second setting I am unhappy: I get blue shades for values less than median(y), and red shades for those above such value. How can I impose zero as the threshold for the two colors?

To be more specific, I would like a command that automatically ensures the two-colors system in any tree.


Solution

  • Ook, I answered my own question. The solution is actually quite simple: if the box.palette option is a two-color diverging palette (as in my example), we can use pal.thresh to set the threshold we want. In my case:

    prp(my.tree,
        type = 2,
        clip.right.labs = FALSE,
        extra = 101,
        under = FALSE,
        under.cex = 1,
        fallen.leaves = TRUE,
        box.palette = "BuRd",
        branch = 1,
        round = 0,
        leaf.round = 0,
        prefix = "" ,
        main = "",
        cex.main = 1.5,
        branch.col = "gray",
        branch.lwd = 3,
        pal.thresh = 0) # HERE THE SOLUTION!
    

    Even if this is probably bad for me, I will leave here the answer for future users and close the question, rather than deleting it.