rtreeparty

surrogate splitting with Model-based Recursive Partitioning (partykit R)


I am estimating a simple lmtree model using partykit library on R. In this estimation I have a dependent variable y, an explanatory variable x, and a set of partitioning variables z.

Some of my partitioning variables have a high quantity of missing values, and I noticed that the final estimated model sample size is net of all the missing values in those few categories. This implies that, as far as at least one of the partitioning variables have a missing value in my data frame, the entire row is removed from the estimation and I lose the information provided by all the rest of non-missing partitioning variables.

To solve this problem in more traditional conditional inference trees estimations is used the surrogate splitting (for example, with ctree_control function from partykit you can select the maxsurrogate performed in the ctree estimation).

Is it possible to perform surrogate splitting also in lmtree (model-based recursive partitioning)?


Solution

  • At the moment the partykit implementation of mob (and hence lmtree and glmtree) does not provide surrogate splits yet.

    We are working on a new reimplementation where both ctree and mob can be used with surrogate variables and both can be used in the background for lmtree and glmtree etc.

    For now the best solution when you need model-based recursive partitioning with surrogate splits is to use ctree with a custom model-based ytrafo function. This will use the CTree algorithm in the background (rather than MOB) but these often yield rather similar results. From an applied perspective the more important difference is that lmtree provides various convenience features, especially for plot and predict, that ctree does not have.