In a simulation study, I'm using the REEMctree model provided by Fu & Simonoff (2015).
They created a function with which one can fit REEMctree models. The function creates an object in which I'd like to search for every variable which has been selected for a split. I'm only interested in whether a variable is used in a split or not. I've tried some things but was unable to achieve that.
This is the function provided by Fu & Simonoff (2015):
# Load following package first
library(party)
#
#-------------------------start of main function----------------------------------------------------
REEMctree_function <- function (formula, data, random, subset = NULL, initialRandomEffects = rep(0,
TotalObs), ErrorTolerance = 0.001, MaxIterations = 1000,
verbose = FALSE, lme.control = lmeControl(returnObject = TRUE), ctree.control = party::ctree_control(mincriterion = 0.95),
method = "REML", correlation = NULL)
{
TotalObs <- dim(data)[1]
if (identical(subset, NULL)) {
subs <- rep(TRUE, dim(data)[1])
}
else {
subs <- subset
}
Predictors <- paste(attr(terms(formula), "term.labels"),
collapse = "+")
TargetName <- formula[[2]]
if (length(TargetName) > 1)
TargetName <- TargetName[3]
if (verbose)
print(paste("Target variable: ", TargetName))
Target <- data[, toString(TargetName)]
ContinueCondition <- TRUE
iterations <- 0
AdjustedTarget <- Target - initialRandomEffects
oldlik <- -Inf
newdata <- data
newdata[, "SubsetVector"] <- subs
while (ContinueCondition) {
newdata[, "AdjustedTarget"] <- AdjustedTarget
iterations <- iterations + 1
tree <- party::ctree(formula(paste(c("AdjustedTarget", Predictors), collapse = "~")), data = newdata, subset = subs, controls = ctree.control)
if (verbose)
print(tree)
newdata[, "nodeInd"] <- 0
newdata[subs, "nodeInd"] <- where(tree)
if (min(where(tree)) == max(where(tree))) { #it doesn't split on root
lmefit <- lme(formula(paste(c(toString(TargetName),
1), collapse = "~")), data = newdata, random = random,
subset = SubsetVector, method = method, control = lme.control,
correlation = correlation)
}
else {
lmefit <- lme(formula(paste(c(toString(TargetName),
"as.factor(nodeInd)"), collapse = "~")), data = newdata,
random = random, subset = SubsetVector, method = method,
control = lme.control, correlation = correlation)
}
if (verbose) {
print(lmefit)
print(paste("Estimated Error Variance = ", lmefit$sigma))
print("Estimated Random Effects Variance = ")
print(as.matrix(lmefit$modelStruct$reStruct[[1]]) *
lmefit$sigma^2)
}
newlik <- logLik(lmefit)
if (verbose)
print(paste("Log likelihood: ", newlik))
ContinueCondition <- (newlik - oldlik > ErrorTolerance &
iterations < MaxIterations)
oldlik <- newlik
AllEffects <- lmefit$residuals[, 1] - lmefit$residuals[,
dim(lmefit$residuals)[2]]
AdjustedTarget[subs] <- Target[subs] - AllEffects
}
residuals <- rep(NA, length = length(Target))
residuals[subs] <- Target[subs] - predict(lmefit)
attr(residuals, "label") <- NULL
result <- list(Tree = tree, EffectModel = lmefit, RandomEffects = ranef(lmefit),
BetweenMatrix = as.matrix(lmefit$modelStruct$reStruct[[1]]) *
lmefit$sigma^2, ErrorVariance = lmefit$sigma^2, data = data,
logLik = newlik, IterationsUsed = iterations, Formula = formula,
Random = random, Subset = subs, ErrorTolerance = ErrorTolerance,
correlation = correlation, residuals = residuals, method = method,
lme.control = lme.control, ctree.control = ctree.control)
class(result) <- "REEMctree"
return(result)
}
#-------------------------end of main function----------------------------------------------------
#--------------------------------------------------------------------------------------------------
Here I'm using this function to fit it to some random data:
# Generate some example data
set.seed(123)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- sin(2 * pi * x1) + x2 + rnorm(n)
group <- factor(rep(1:10, each = 10))
df <- data.frame(x1 = x1, x2 = x2, y = y, group = group)
# fitting the REEMctree ---------------------------------------------------
REEM.ctree.result<-REEMctree_function(y~x1+x2, data=df, random=~1|group)
plot(REEM.ctree.result$Tree)
I tried this to extract the variables:
unlist_res <- unlist(REEM.ctree.result$Tree)
"x1" %in% unlist_res
"x2" %in% unlist_res
I get the following Error:
Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments
I also tried this code (which I'm using to do the exact same thing but with a 'normal' ctree):
all_splits <- unlist(REEM.ctree.result@tree)[which(grepl("psplit.variableName", names(unlist(REEM.ctree.result@tree))))]
"x1" %in% all_splits
"x2" %in% all_splits
Here I get this Error:
Error in unlist(REEM.ctree.result@tree) : trying to get slot "tree" from an object (class "REEMctree") that is not an S4 object
Any ideas on how I might come by those variables?
Thanks!
Paper: Fu, W., & Simonoff, J. S. (2015). Unbiased regression trees for longitudinal and clustered data. Computational Statistics & Data Analysis, 88, 53-74.
You can access the variables by 'capture' the Tree and then searching in it. A bit rough but it works.
capture_REEMctree <- capture.output(my_REEMctree$Tree@tree)
grep("x1", capture_REEMctree )
grep("x2", capture_REEMctree )