I am estimating a simple lmtree
model using partykit
library on R.
In this estimation I have a dependent variable y, an explanatory variable x, and a set of partitioning variables z.
Some of my partitioning variables have a high quantity of missing values, and I noticed that the final estimated model sample size is net of all the missing values in those few categories. This implies that, as far as at least one of the partitioning variables have a missing value in my data frame, the entire row is removed from the estimation and I lose the information provided by all the rest of non-missing partitioning variables.
To solve this problem in more traditional conditional inference trees estimations is used the surrogate splitting (for example, with ctree_control
function from partykit
you can select the maxsurrogate
performed in the ctree
estimation).
Is it possible to perform surrogate splitting also in lmtree
(model-based recursive partitioning)?
At the moment the partykit
implementation of mob
(and hence lmtree
and glmtree
) does not provide surrogate splits yet.
We are working on a new reimplementation where both ctree
and mob
can be used with surrogate variables and both can be used in the background for lmtree
and glmtree
etc.
For now the best solution when you need model-based recursive partitioning with surrogate splits is to use ctree
with a custom model-based ytrafo
function. This will use the CTree algorithm in the background (rather than MOB) but these often yield rather similar results. From an applied perspective the more important difference is that lmtree
provides various convenience features, especially for plot
and predict
, that ctree
does not have.