rcoefficientsmultivariate-testing

R: testing whether a coefficient is equal across the different equations in a multivariate regression (using linearhypothesis())?


I have a question about how to compare coefficients in a multivariate regression in R.

I conducted a survey in which I measured three different attitudes (scale variables). My goal is to estimate whether some characteristics of the respondents (age, gender, education and ideological position) can explain their (positve/negative) attitudes.

I was advised to conduct a multivariate multiple regression instead of three univariate multiple regression. The code of my multivariate model is:

MMR <- lm(cbind(Attitude_1, Attitude_2, Attitude_3) ~ 
             Age + Gender + Education + Ideological_position, 
           data = survey)
summary(MMR)

What I am trying to do next is to estimate whether the coefficients of let's say 'Gender' are statistically significant across the three individual models.

I found a very clear instruction how to do this in Stata (https://stats.idre.ucla.edu/stata/dae/multivariate-regression-analysis/), but I don't have a license, so I have to find an alternative in R. I know a similar question has been asked here before (R - Testing equivalence of coefficients in multivariate multiple regression), but the answer was that there does not exist a package (or function) in R which can be used for this purpose. Because this answer was provided a few years back, I was wondering whether in the meantime some new packages or functions are implemented.

More precisely, I was wondering whether I can use the linearHypothesis() function (https://www.rdocumentation.org/packages/car/versions/3.0-11/topics/linearHypothesis)? I already know that this function allows me to test, for instance, whether the coefficient of Gender equals to coefficient of Education:

linearhypothesis(MMR, c("GenderFemale", "EducationHigh-educated")

Can I also use this function to test whether the coefficient of Gender in the equation modelling Attitude_1 equals the coefficient of Gender in the equation modelling Attitude_2 or Attitude_3?

Any help would be greatly appreciated!


Solution

  • Since the model presented in the question is not reproducible (the input is missing) let us use this model instead.

    fm0 <- lm(cbind(cyl, mpg) ~ wt + hp, mtcars)
    

    We will discuss two approaches using as our linear hypothesis that the intercepts of the cyl and mpg groups are the same, that the wt slopes are the same and the hp slopes are the same.

    1) Mean/Variance

    In this approach we base the entire comparison only on the coefficients and their variance covariance matrix.

    library(car)
    
    v <- vcov(fm0)
    co <- setNames(c(coef(fm0)), rownames(v))
    h1 <- c("cyl:(Intercept) = mpg:(Intercept)", "cyl:wt = mpg:wt", "cyl:hp = mpg:hp")
    
    linearHypothesis(NULL, h1, coef. = co, vcov. = v)
    

    giving:

    Linear hypothesis test
    
    Hypothesis:
    cyl:((Intercept) - mpg:(Intercept) = 0
    cyl:wt - mpg:wt = 0
    cyl:hp - mpg:hp = 0
    
    Model 1: restricted model
    Model 2: structure(list(), class = "formula", .Environment = <environment>)
    
    Note: Coefficient covariance matrix supplied.
    
      Df  Chisq Pr(>Chisq)    
    1                         
    2  3 878.53  < 2.2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    To explain what linearHypothesis is doing note that In this case the hypothesis matrix is L <- t(c(1, -1)) %x% diag(3) and given v then as a large sample approximation we have that L %*% co is distributed as N(0, L %*% v %*% t(L)) under the null hypothesis hence t(L %*% co) %*% solve(L %*% v %*% t(L)) %*% L %*% co is distributed as chi squared with nrow(L) degrees of freedom.

    L <- t(c(1, -1)) %>% diag(3)
    nrow(L) # degrees of freedom
    SSH <- t(L %*% co) %*% solve(L %*% v %*% t(L)) %*% L %*% co # chisq
    p <- pchisq(SSH, nrow(L), lower.tail = FALSE) # p value
    

    2) Long form model

    With this approach (which is not equivalent to the first one shown above) convert mtcars from wide to long form, mt2. We show how to do that using reshape or pivot_longer at the end but for now we will just form it explicitly. Define lhs as the 32x2 matrix on the left hand side of the fm0 formula, i.e. cbind(cyl, mpg). Note that its column names are c("cyl", "mpg"). Stringing out lhs column by column into a 64 long vector of the cyl column followed by the mpg column gives us our new dependent variable y. We also form a grouping variable g. the same length as y which indicates which column in lhs the corresponding element of y is from.

    With mt2 defined we can form fm1. In forming fm1 We will use a weight vector w based on the fm0 sigma values to reflect the fact that the two groups, cyl and mpg, have different values of sigma given by the vector sigma(fm0).

    We show below that the fm0 and fm1 models have the same coefficients and then run linearHypothesis.

    library(car)
    
    lhs <- fm0$model[[1]]
    g. <- colnames(lhs)[col(lhs)]
    y <- c(lhs)
    mt2 <- with(mtcars, data.frame(wt, hp, g., y))
      
    w <- 1 / sigma(fm0)[g.]^2
    fm1 <- lm(y ~ g./(wt + hp) + 0, mt2, weights = w)
    
    # note coefficient names
    variable.names(fm1)
    ## [1] "g.cyl" "g.mpg" "g.cyl:wt" "g.mpg:wt" "g.cyl:hp" "g.mpg:hp"
    
    # check that fm0 and fm1 have same coefs
    all.equal(c(t(coef(fm0))), coef(fm1), check.attributes = FALSE)
    ## [1] TRUE
    
    h2 <- c("g.mpg = g.cyl", "g.mpg:wt = g.cyl:wt", "g.mpg:hp = g.cyl:hp")
    linearHypothesis(fm1, h2)
    

    giving:

    Linear hypothesis test
    
    Hypothesis:
    - g.cyl  + g.mpg = 0
    - g.cyl:wt  + g.mpg:wt = 0
    - g.cyl:hp  + g.mpg:hp = 0
    
    Model 1: restricted model
    Model 2: y ~ g./(wt + hp) + 0
    
      Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
    1     61 1095.8                                  
    2     58   58.0  3    1037.8 345.95 < 2.2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    If L is the hypothesis matrix which is the same as L in (1) except the columns are reordered, q is its number of rows, n is the number of rows of mt2 then SSH/q is distributed F(q, n-q-1) so we have:

    n <- nrow(mt2)
    L <- diag(3) %x% t(c(1, -1))  # note difference from (1)
    q <- nrow(L)
    SSH <- t(L %*% coef(fm1)) %*% solve(L %*% vcov(fm1) %*% t(L)) %*% L %*% coef(fm1)
    SSH/q # F value
    pf(SSH/q, q, n-q-1, lower.tail = FALSE) # p value
    

    anova

    An alternative to linearHypothesis is to define the reduced model and then compare the two models using anova. mt2 and w are from above. No packages are used.

    fm2 <- lm(y ~ hp + wt, mt2, weights = w)
    anova(fm2, fm1)
    

    giving:

    Analysis of Variance Table
    
    Model 1: y ~ hp + wt
    Model 2: y ~ g./(wt + hp) + 0
      Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
    1     61 1095.8                                  
    2     58   58.0  3    1037.8 345.95 < 2.2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    Alternate wide to long calculation

    An alternate way to form mt2 is by reshaping mtcars from wide form to long form using reshape.

    mt2a <- mtcars |>
      reshape(dir = "long", varying = list(colnames(lhs)), v.names = "y",
        timevar = "g.", times = colnames(lhs)) |>
      subset(select = c("wt", "hp", "g.", "y"))
    

    or using tidyverse (which has rows in a different order but that should not matter as long as mat2b is used consistently in forming fm1 and w.

    library(dplyr)
    library(tidyr)
    
    mt2b <- mtcars %>%
      select(mpg, cyl, wt, hp) %>%
      pivot_longer(all_of(colnames(lhs)), names_to = "g.", values_to = "y")