rsplittreedecision-treectree

Accessing variables in a REEMctree model


In a simulation study, I'm using the REEMctree model provided by Fu & Simonoff (2015).

They created a function with which one can fit REEMctree models. The function creates an object in which I'd like to search for every variable which has been selected for a split. I'm only interested in whether a variable is used in a split or not. I've tried some things but was unable to achieve that.

This is the function provided by Fu & Simonoff (2015):

# Load following package first
library(party)
#
#-------------------------start of main function----------------------------------------------------
REEMctree_function <- function (formula, data, random, subset = NULL, initialRandomEffects = rep(0, 
                                                                                        TotalObs), ErrorTolerance = 0.001, MaxIterations = 1000, 
                       verbose = FALSE, lme.control = lmeControl(returnObject = TRUE), ctree.control = party::ctree_control(mincriterion = 0.95),
                       method = "REML", correlation = NULL) 
{
  TotalObs <- dim(data)[1]
  if (identical(subset, NULL)) {
    subs <- rep(TRUE, dim(data)[1])
  }
  else {
    subs <- subset
  }
  Predictors <- paste(attr(terms(formula), "term.labels"), 
                      collapse = "+")
  TargetName <- formula[[2]]
  if (length(TargetName) > 1) 
    TargetName <- TargetName[3]
  if (verbose) 
    print(paste("Target variable: ", TargetName))
  Target <- data[, toString(TargetName)]
  ContinueCondition <- TRUE
  iterations <- 0
  AdjustedTarget <- Target - initialRandomEffects
  oldlik <- -Inf
  newdata <- data
  newdata[, "SubsetVector"] <- subs
  while (ContinueCondition) {
    newdata[, "AdjustedTarget"] <- AdjustedTarget
    iterations <- iterations + 1
    
    tree <- party::ctree(formula(paste(c("AdjustedTarget", Predictors), collapse = "~")), data = newdata, subset = subs, controls = ctree.control)  
    
    if (verbose) 
      print(tree)
    newdata[, "nodeInd"] <- 0
    newdata[subs, "nodeInd"] <- where(tree)
    if (min(where(tree)) == max(where(tree))) { #it doesn't split on root
      lmefit <- lme(formula(paste(c(toString(TargetName), 
                                    1), collapse = "~")), data = newdata, random = random, 
                    subset = SubsetVector, method = method, control = lme.control, 
                    correlation = correlation)
    }
    else {
      lmefit <- lme(formula(paste(c(toString(TargetName), 
                                    "as.factor(nodeInd)"), collapse = "~")), data = newdata, 
                    random = random, subset = SubsetVector, method = method, 
                    control = lme.control, correlation = correlation)
    }
    if (verbose) {
      print(lmefit)
      print(paste("Estimated Error Variance = ", lmefit$sigma))
      print("Estimated Random Effects Variance = ")
      print(as.matrix(lmefit$modelStruct$reStruct[[1]]) * 
              lmefit$sigma^2)
    }
    newlik <- logLik(lmefit)
    if (verbose) 
      print(paste("Log likelihood: ", newlik))
    ContinueCondition <- (newlik - oldlik > ErrorTolerance & 
                            iterations < MaxIterations)
    oldlik <- newlik
    AllEffects <- lmefit$residuals[, 1] - lmefit$residuals[, 
                                                           dim(lmefit$residuals)[2]]
    AdjustedTarget[subs] <- Target[subs] - AllEffects
  }
  residuals <- rep(NA, length = length(Target))
  residuals[subs] <- Target[subs] - predict(lmefit)
  attr(residuals, "label") <- NULL 
  
  result <- list(Tree = tree, EffectModel = lmefit, RandomEffects = ranef(lmefit), 
                 BetweenMatrix = as.matrix(lmefit$modelStruct$reStruct[[1]]) * 
                   lmefit$sigma^2, ErrorVariance = lmefit$sigma^2, data = data, 
                 logLik = newlik, IterationsUsed = iterations, Formula = formula, 
                 Random = random, Subset = subs, ErrorTolerance = ErrorTolerance, 
                 correlation = correlation, residuals = residuals, method = method, 
                 lme.control = lme.control, ctree.control = ctree.control)
  class(result) <- "REEMctree"
  return(result)
}
#-------------------------end of main function----------------------------------------------------
#--------------------------------------------------------------------------------------------------

Here I'm using this function to fit it to some random data:

# Generate some example data
set.seed(123)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- sin(2 * pi * x1) + x2 + rnorm(n)
group <- factor(rep(1:10, each = 10))

df <- data.frame(x1 = x1, x2 = x2, y = y, group = group)


# fitting the REEMctree ---------------------------------------------------
REEM.ctree.result<-REEMctree_function(y~x1+x2, data=df, random=~1|group)
plot(REEM.ctree.result$Tree)

I tried this to extract the variables:

unlist_res <- unlist(REEM.ctree.result$Tree)

"x1" %in% unlist_res
"x2" %in% unlist_res

I get the following Error:

Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments

I also tried this code (which I'm using to do the exact same thing but with a 'normal' ctree):

all_splits <- unlist(REEM.ctree.result@tree)[which(grepl("psplit.variableName", names(unlist(REEM.ctree.result@tree))))]

"x1" %in% all_splits
"x2" %in% all_splits

Here I get this Error:

Error in unlist(REEM.ctree.result@tree) : trying to get slot "tree" from an object (class "REEMctree") that is not an S4 object

Any ideas on how I might come by those variables?

Thanks!

Paper: Fu, W., & Simonoff, J. S. (2015). Unbiased regression trees for longitudinal and clustered data. Computational Statistics & Data Analysis, 88, 53-74.


Solution

  • You can access the variables by 'capture' the Tree and then searching in it. A bit rough but it works.

    capture_REEMctree <- capture.output(my_REEMctree$Tree@tree)
    grep("x1", capture_REEMctree )
    grep("x2", capture_REEMctree )