rparty

Is it possible to establish splitting criteria in partykit::mob() with a model and then fit a different model to terminal nodes?


Sometimes when working with this package, I only want to assess heterogeneity in one parameter or another. However, I don't think I can do that and then fit a more complete model to the terminal nodes in one step. Is there a way to do that? Here's what the code I want to do should look like (I think), but it does not work:

full_mod <- 
  function(y, x, weights = NULL, start = NULL, offset = NULL, ...) {
  lm(y ~ x + 1, ...)
  }

tree_1 <- 
  mob(
# assess heterogeneity in slope, ignoring intercepts
    Sepal.Length ~ 0 + Sepal.Width | Species, 
    data = iris,
# fit each terminal node WITH intercepts
    fit = full_mod
    )

This achieves what I want to do, but I'm looking for a single-step way.

tree2 <- 
  lmtree(
    Sepal.Length ~ 0 + Sepal.Width | Species, 
    data = iris
    )

iris <- 
  iris %>% 
  mutate(prediction = predict(tree2, type = 'node'))


lms <- iris %>% 
  nest_by(prediction) %>% 
  rowwise() %>% 
  summarize(linear_model = list(lm(Sepal.Length ~ Sepal.Width, data = data)))  

I see that this is not the best method here with continuous variables, but with dichotomous predictors, I think this could be very powerful and would like to write some code to do this and assess this variant of the model (as long as there is not another way to to do it).

ADDED ON 1st EDIT: Perhaps an alternative way to fit this type of model would be to optimize fit based on homogeneity in a chosen regression parameter (rather than entire model-based deviance, log-likelihood, etc.). I'm happy with either solution, but (personally) had more trouble trying to go the latter.

Thank you! Christopher Loan


Solution

  • In mob_control() you can specify the parm argument. This means that only a certain subset of the parameters, say parm = 2 (the second parameter) or parm = "x" (the coefficient of x) get tested for parameter instability.

    However, the catch is that once a variable is selected for splitting, then the best split point is searched by optimizing the overall objective function (e.g., error sum of squares or log-likelihood etc.) of the model. Thus, this will be sensitive to all changes in all parameters of the model.

    A better alternative for fixing some parameters globally and only splitting with respect to others is to iterate between:

    1. Estimating the (generalized) linear model given the subgroups from the tree.
    2. Estimating the tree (and its subgroups) while keeping the global parameters of the model fixed.

    This is what the PALM tree algorithm does for partially additive (generalized) linear models. It is implemented in the palmtree package in R. For the methodological background see: Heidi Seibold, Torsten Hothorn, Achim Zeileis (2019). "Generalised Linear Model Trees with Global Additive Effects." Advances in Data Analysis and Classification, 13(3), 703-725. doi:10.1007/s11634-018-0342-1

    A replication of the empirical illustration in the paper is provided in: https://www.zeileis.org/news/palmtree/