I have a question about how to compare coefficients in a multivariate regression in R.
I conducted a survey in which I measured three different attitudes (scale variables). My goal is to estimate whether some characteristics of the respondents (age, gender, education and ideological position) can explain their (positve/negative) attitudes.
I was advised to conduct a multivariate multiple regression instead of three univariate multiple regression. The code of my multivariate model is:
MMR <- lm(cbind(Attitude_1, Attitude_2, Attitude_3) ~
Age + Gender + Education + Ideological_position,
data = survey)
summary(MMR)
What I am trying to do next is to estimate whether the coefficients of let's say 'Gender' are statistically significant across the three individual models.
I found a very clear instruction how to do this in Stata (https://stats.idre.ucla.edu/stata/dae/multivariate-regression-analysis/), but I don't have a license, so I have to find an alternative in R. I know a similar question has been asked here before (R - Testing equivalence of coefficients in multivariate multiple regression), but the answer was that there does not exist a package (or function) in R which can be used for this purpose. Because this answer was provided a few years back, I was wondering whether in the meantime some new packages or functions are implemented.
More precisely, I was wondering whether I can use the linearHypothesis() function (https://www.rdocumentation.org/packages/car/versions/3.0-11/topics/linearHypothesis)? I already know that this function allows me to test, for instance, whether the coefficient of Gender equals to coefficient of Education:
linearhypothesis(MMR, c("GenderFemale", "EducationHigh-educated")
Can I also use this function to test whether the coefficient of Gender in the equation modelling Attitude_1 equals the coefficient of Gender in the equation modelling Attitude_2 or Attitude_3?
Any help would be greatly appreciated!
Since the model presented in the question is not reproducible (the input is missing) let us use this model instead.
fm0 <- lm(cbind(cyl, mpg) ~ wt + hp, mtcars)
We will discuss two approaches using as our linear hypothesis that the intercepts of the cyl and mpg groups are the same, that the wt slopes are the same and the hp slopes are the same.
In this approach we base the entire comparison only on the coefficients and their variance covariance matrix.
library(car)
v <- vcov(fm0)
co <- setNames(c(coef(fm0)), rownames(v))
h1 <- c("cyl:(Intercept) = mpg:(Intercept)", "cyl:wt = mpg:wt", "cyl:hp = mpg:hp")
linearHypothesis(NULL, h1, coef. = co, vcov. = v)
giving:
Linear hypothesis test
Hypothesis:
cyl:((Intercept) - mpg:(Intercept) = 0
cyl:wt - mpg:wt = 0
cyl:hp - mpg:hp = 0
Model 1: restricted model
Model 2: structure(list(), class = "formula", .Environment = <environment>)
Note: Coefficient covariance matrix supplied.
Df Chisq Pr(>Chisq)
1
2 3 878.53 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
To explain what linearHypothesis is doing note that In this case the hypothesis matrix is L <- t(c(1, -1)) %x% diag(3)
and given v then as a large sample approximation we have that L %*% co
is distributed as N(0, L %*% v %*% t(L))
under the null hypothesis hence t(L %*% co) %*% solve(L %*% v %*% t(L)) %*% L %*% co
is distributed as chi squared with nrow(L)
degrees of freedom.
L <- t(c(1, -1)) %>% diag(3)
nrow(L) # degrees of freedom
SSH <- t(L %*% co) %*% solve(L %*% v %*% t(L)) %*% L %*% co # chisq
p <- pchisq(SSH, nrow(L), lower.tail = FALSE) # p value
With this approach (which is not equivalent to the first one shown above) convert mtcars from wide to long form, mt2. We show how to do that using reshape or pivot_longer at the end but for now we will just form it explicitly. Define lhs as the 32x2 matrix on the left hand side of the fm0 formula, i.e. cbind(cyl, mpg). Note that its column names are c("cyl", "mpg"). Stringing out lhs column by column into a 64 long vector of the cyl column followed by the mpg column gives us our new dependent variable y. We also form a grouping variable g. the same length as y which indicates which column in lhs the corresponding element of y is from.
With mt2 defined we can form fm1. In forming fm1 We will use a weight vector w based on the fm0 sigma values to reflect the fact that the two groups, cyl and mpg, have different values of sigma given by the vector sigma(fm0).
We show below that the fm0 and fm1 models have the same coefficients and then run linearHypothesis.
library(car)
lhs <- fm0$model[[1]]
g. <- colnames(lhs)[col(lhs)]
y <- c(lhs)
mt2 <- with(mtcars, data.frame(wt, hp, g., y))
w <- 1 / sigma(fm0)[g.]^2
fm1 <- lm(y ~ g./(wt + hp) + 0, mt2, weights = w)
# note coefficient names
variable.names(fm1)
## [1] "g.cyl" "g.mpg" "g.cyl:wt" "g.mpg:wt" "g.cyl:hp" "g.mpg:hp"
# check that fm0 and fm1 have same coefs
all.equal(c(t(coef(fm0))), coef(fm1), check.attributes = FALSE)
## [1] TRUE
h2 <- c("g.mpg = g.cyl", "g.mpg:wt = g.cyl:wt", "g.mpg:hp = g.cyl:hp")
linearHypothesis(fm1, h2)
giving:
Linear hypothesis test
Hypothesis:
- g.cyl + g.mpg = 0
- g.cyl:wt + g.mpg:wt = 0
- g.cyl:hp + g.mpg:hp = 0
Model 1: restricted model
Model 2: y ~ g./(wt + hp) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 61 1095.8
2 58 58.0 3 1037.8 345.95 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If L is the hypothesis matrix which is the same as L in (1) except the columns are reordered, q is its number of rows, n is the number of rows of mt2 then SSH/q is distributed F(q, n-q-1) so we have:
n <- nrow(mt2)
L <- diag(3) %x% t(c(1, -1)) # note difference from (1)
q <- nrow(L)
SSH <- t(L %*% coef(fm1)) %*% solve(L %*% vcov(fm1) %*% t(L)) %*% L %*% coef(fm1)
SSH/q # F value
pf(SSH/q, q, n-q-1, lower.tail = FALSE) # p value
An alternative to linearHypothesis is to define the reduced model and then compare the two models using anova. mt2 and w are from above. No packages are used.
fm2 <- lm(y ~ hp + wt, mt2, weights = w)
anova(fm2, fm1)
giving:
Analysis of Variance Table
Model 1: y ~ hp + wt
Model 2: y ~ g./(wt + hp) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 61 1095.8
2 58 58.0 3 1037.8 345.95 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
An alternate way to form mt2 is by reshaping mtcars from wide form to long form using reshape.
mt2a <- mtcars |>
reshape(dir = "long", varying = list(colnames(lhs)), v.names = "y",
timevar = "g.", times = colnames(lhs)) |>
subset(select = c("wt", "hp", "g.", "y"))
or using tidyverse (which has rows in a different order but that should not matter as long as mat2b is used consistently in forming fm1 and w.
library(dplyr)
library(tidyr)
mt2b <- mtcars %>%
select(mpg, cyl, wt, hp) %>%
pivot_longer(all_of(colnames(lhs)), names_to = "g.", values_to = "y")