rglminteractionvariance

Relative importance/Variation partitioning in a GLM containing an interaction


I have a question regarding the relative importance of variables, in a GLM that contains an interaction (continuous * factor).

I am experimenting with an approach based on partitioning the explained variation, approximated through (pseudo)-R-squared. But I am unsure of how to that (1) in a GLM, and (2) with a model that contains an interaction.

For simplicity, I have prepared an example model with a Guassian GLM with a single interaction (using mtcars dataset, see code at the end of the post). But I am actually interested in applying the method to a Generalized Poisson GLM, which might contain multiple interactions. A few questions arise from the test model:

  1. How to partition the R-squared correctly? I have attempted a partition, but I’m unsure if that’s the right way.
  2. The r-squared of each term do not add up to the r-squared of the full model (not even close). This also happens with a model that contains no interaction. Aside from mistakes in partitioning the r-squared (I still consider myself a newbie to stats :P); could this also be influenced by collinearity? The variance inflation factors are below 3 after scaling the continuous predictors (a model with no scaling has the highest VIF = 5.7).

Any help much appreaciated!


library(tidyverse)
library(rsq)
library(car)

data <- mtcars %>%
  # scale reduces collinearity: without standardizing, the variance inflation factor for the factor is 5.7
  mutate(disp = scale(disp))
data$am <- factor(data$am)

summary(data)

# test model, continuous response (miles per gallon), type of transmission (automatic/manual) as factor, displacement as continuous
model <-
  glm(mpg ~ am + disp + am:disp,
      data = data,
      family = gaussian(link = "identity"))
drop1(model, test = "F")

# graph the data
ggplot(data = data, aes(x = disp, y = mpg, col = am)) + geom_jitter() + geom_smooth(method = "glm")

# Attempted partitioning
(rsq_full <- rsq::rsq(model, adj = TRUE, type = "v"))

(rsq_int <- rsq_full - rsq::rsq(update(model, . ~ . - am:disp), adj = TRUE, type = "v"))

(rsq_factor <- rsq_full - rsq::rsq(update(model, . ~ . - am - am:disp), adj = TRUE, type = "v"))

(rsq_cont <- rsq_full - rsq::rsq(update(model, . ~ . - disp - am:disp), adj = TRUE, type = "v"))

c(rsq_full, rsq_int + rsq_factor + rsq_cont)

car::vif(model)


# A simpler model with no interaction
model2 <- glm(mpg ~ am + disp, data = data, family = gaussian(link = "identity"))
drop1(model2, test = "F")

(rsq_full2 <- rsq::rsq(model2, adj = TRUE, type = "v"))
(rsq_factor2 <- rsq_full2 - rsq::rsq(update(model2, . ~ . - am), adj = TRUE, type = "v"))
(rsq_cont2 <- rsq_full2 - rsq::rsq(update(model2, . ~ . - disp), adj = TRUE,type = "v"))

c(rsq_full2, rsq_factor2 + rsq_cont2)

car::vif(model2)



Solution

  • Thanks to Joseph Luchman for sharing this methodology over at GitHub!

    A few strategies to deal with the relative importance of interactions are presented in: LeBreton, J. M., Tonidandel, S., & Krasikova, D. V. (2013). Residualized relative importance analysis: A technique for the comprehensive decomposition of variance in higher order regression models. Organizational Research Methods, 16(3), 449-473.

    They discuss the strategy of calculating the increase in R2 when the interaction is added. Moreover, the contribution of the interaction to the R2 "alone" could be done with residualization: taking the residuals of a regression that models the product between variables as a function of the additive effects of the variables: https://github.com/jluchman/domir/discussions/5

    The resulting residualized interaction could be used in dominance analysis, which takes a model, and splits R-squared (or other metric of model quality like AUC), into independent contributions by each of the independent variables. It can be implemented in R with the domir package (https://github.com/jluchman/domir).