Is there a way to do multivariate logistic regression using glm()? I have several binary outcomes and I know you can do this with linear regression (lm()) and cbind() but I can't seem to figure out how to do it with glm() and cbind()
library(tidyverse)
lm(cbind(mpg, cyl, disp) ~ hp + drat + wt + qsec, data = mtcars) %>%
broom::tidy(conf.int = T)
This returns a tidy tibble with your outcomes (mpg, cyl, disp). How can I extend this to logistic regression.
df <- mtcars %>%
mutate(across(c(vs, am), factor))
glm(cbind(am, vs) ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>%
broom::tidy()
I know this isn't the best dataset to use (probably due to complete separation) but cbind just returns an unexpected output than if you ran the glms separately.
glm(am ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>%
broom::tidy()
glm(vs ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>%
broom::tidy()
Have a look at
glm(cbind(am, vs) ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df, method = "model.frame") %>%
broom::tidy()
which returns
# A tibble: 5 × 13
column n mean sd median trimmed mad min max range skew kurtosis se
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 cbind(am, vs) 64 1.42 0.498 1 1.40 0 1 2 1 0.316 1.10 0.0622
2 hp 32 147. 68.6 123 141. 52 52 335 283 0.761 3.05 12.1
3 drat 32 3.60 0.535 3.70 3.58 0.475 2.76 4.93 2.17 0.279 2.44 0.0945
4 wt 32 3.22 0.978 3.32 3.15 0.517 1.51 5.42 3.91 0.444 3.17 0.173
5 qsec 32 17.8 1.79 17.7 17.8 0.955 14.5 22.9 8.4 0.387 3.55 0.316
In the first line of results in the output above, you can see that glm
seems to interpret cbind(am, vs)
as a combined dependent variable of am and vs having multiple categories and not being dichotomous (in this example 0 or 1). This kind of dependent variable would require multinomial logistic regression as pointed out by Ben Bolker.
As you mentioned, if you run seperate glm
models with one dependent variable at a time, you get different results and warning messages regarding complete separation as well as fitted probabilities having values (very close to) 0 or 1.
If you would like to run the two glm
models in one go and present results in a single object, we can use purrr:map
as shown here which results in the following approach:
#--------------
# Load packages
#--------------
library(tidyverse)
#--------------
# Define data, including independent variables & dependent variables
#--------------
df <- mtcars
independent.variables.formula <- "~ hp + drat + wt + qsec"
dependent.variables <- c("am", "vs")
#--------------
# create output for both models in a data.frame showing the results
#--------------
df_res <- map(dependent.variables, function(DV) {
paste(DV, independent.variables.formula) %>%
as.formula %>%
glm(family=binomial(link = "logit"), data = df) %>%
broom::tidy(conf.int = T)
}) %>%
bind_rows() %>%
as.data.frame () %>%
mutate (DV = c(rep ("am", 5), rep ("vs", 5)),
across(c(2:4, 6:7), .fns = function(x) {format(round(x, 2), nsmall = 2)})) %>%
relocate (DV, .before = term)
df_res[ ,6] <- format.pval(df_res[ ,6], eps = .001, digits = 3) # format small p-values < 0.001 nicely
df_res
DV term estimate std.error statistic p.value conf.low conf.high
1 am (Intercept) 206.22 1166788.76 0.00 1.000 NA 238011.23
2 am hp 0.24 687.34 0.00 1.000 -139.85 NA
3 am drat 103.37 132242.73 0.00 0.999 -10359.33 12068.44
4 am wt -94.56 84508.03 0.00 0.999 -54340.19 14115.43
5 am qsec -20.03 34601.06 0.00 1.000 -3473.64 3149.50
6 vs (Intercept) -1928.91 1845493.29 0.00 0.999 -124797.75 120939.92
7 vs hp 1.59 2408.34 0.00 0.999 -489.26 NA
8 vs drat 92.35 139959.57 0.00 0.999 -16047.24 16231.94
9 vs wt -82.30 77899.01 0.00 0.999 -5580.25 4963.40
10 vs qsec 91.59 75674.95 0.00 0.999 -3734.28 4115.71