Sometimes when working with this package, I only want to assess heterogeneity in one parameter or another. However, I don't think I can do that and then fit a more complete model to the terminal nodes in one step. Is there a way to do that? Here's what the code I want to do should look like (I think), but it does not work:
full_mod <-
function(y, x, weights = NULL, start = NULL, offset = NULL, ...) {
lm(y ~ x + 1, ...)
}
tree_1 <-
mob(
# assess heterogeneity in slope, ignoring intercepts
Sepal.Length ~ 0 + Sepal.Width | Species,
data = iris,
# fit each terminal node WITH intercepts
fit = full_mod
)
This achieves what I want to do, but I'm looking for a single-step way.
tree2 <-
lmtree(
Sepal.Length ~ 0 + Sepal.Width | Species,
data = iris
)
iris <-
iris %>%
mutate(prediction = predict(tree2, type = 'node'))
lms <- iris %>%
nest_by(prediction) %>%
rowwise() %>%
summarize(linear_model = list(lm(Sepal.Length ~ Sepal.Width, data = data)))
I see that this is not the best method here with continuous variables, but with dichotomous predictors, I think this could be very powerful and would like to write some code to do this and assess this variant of the model (as long as there is not another way to to do it).
ADDED ON 1st EDIT: Perhaps an alternative way to fit this type of model would be to optimize fit based on homogeneity in a chosen regression parameter (rather than entire model-based deviance, log-likelihood, etc.). I'm happy with either solution, but (personally) had more trouble trying to go the latter.
Thank you! Christopher Loan
In mob_control()
you can specify the parm
argument. This means that only a certain subset of the parameters, say parm = 2
(the second parameter) or parm = "x"
(the coefficient of x
) get tested for parameter instability.
However, the catch is that once a variable is selected for splitting, then the best split point is searched by optimizing the overall objective function (e.g., error sum of squares or log-likelihood etc.) of the model. Thus, this will be sensitive to all changes in all parameters of the model.
A better alternative for fixing some parameters globally and only splitting with respect to others is to iterate between:
This is what the PALM tree algorithm does for partially additive (generalized) linear models. It is implemented in the palmtree
package in R. For the methodological background see: Heidi Seibold, Torsten Hothorn, Achim Zeileis (2019). "Generalised Linear Model Trees with Global Additive Effects." Advances in Data Analysis and Classification, 13(3), 703-725. doi:10.1007/s11634-018-0342-1
A replication of the empirical illustration in the paper is provided in: https://www.zeileis.org/news/palmtree/