rstatisticsglmmultivariate-testing

Multiple Outcome Binomial Regression R


It is possible to use cbind to specify multiple outcomes for plain lm regressions as such:

set.seed(11)
df <- iris %>%
  mutate(var1 = sample(c(0L,1L), 150, replace = TRUE),
         var2 = sample(c(0L,1L), 150, replace = TRUE))

df %>%
  lm(cbind(var1, var2) ~ Sepal.Width + Sepal.Length, data = .) %>%
  coef()

#                      var1        var2
# (Intercept)  0.5010092855  1.17035845
# Sepal.Width  0.0015431925 -0.16210972
# Sepal.Length 0.0001607519 -0.03218511

Can this be done with a binomial glm? If not, how should one perform the intended analysis (MANOVA)?

df %>%
  glm(cbind(var1, var2) ~ Sepal.Width + Sepal.Length, family = binomial, data = .) %>%
  coef()

# (Intercept)  Sepal.Width Sepal.Length 
#  -1.3886130    0.3587017    0.0588412 

The glm above does not appear to be producing a matrix of coefficients as with lm, the output is the same format as if only one outcome is specified.


Solution

  • Not this way. lm() allows this because the specific structure of linear regression allows the linear regression equations to be solved simultaneously (and efficiently) for several response variables. The trick doesn't work for generalized linear models.

    To do this with glm() you have to set it up yourself, e.g.

    ff <- function(var) {
       f <- reformulate(c("Sepal.Length", "Sepal.Width"), response = var)
       glm(f, data = df, family = binomial)
    }
    (c("var1", "var2")
       |> purrr::map(ff)
       |> purrr::map(coef)
       |> bind_cols()
    )
    

    (you still need to sort out the row/column names)

    or

    library(broom)
    (c("var1", "var2")
       |> setNames(nm = _)
       |> purrr::map(ff)
       |> purrr::map_dfr(tidy, .id = "respvar")
       |> select(respvar, term, estimate)
       |> pivot_wider(names_from = "respvar", values_from = "estimate")
    )
    

    lme4::lmList() handles the case where the data are in long format:

    library(broom.mixed)
    library(lme4)
    (df 
       |> pivot_longer(c(var1, var2))
       |> lmList(formula = value ~ Sepal.Length + Sepal.Width | name,
                 family = binomial)
       |> coef()
       |> t() ## transpose
    )
    

    There are lots of other questions on SO that do variants of this (run lm()/glm() for different response variables, on elements of a list, etc etc), but I can't find them easily.