rcoefficientsmultinomialregression-testing

R code to test the difference between coefficients of regressors from a multinomial logit regression


I have the following multinomial logit regression

FLAG ~ Cash + Debt + Other variables

Where FLAG is a dummy with three levels, that distinguishes three different types of firms, while Cash and Debt are some variables regarding those firms.
The results of the regression are the following

Variables         Type 0 vs Type 1          Type 0 vs Type 2
Cash                  0.543***                   0.321***
Debt                  -0.124***                  0.452***

I now want to test whether the difference between these coefficients is statistically significant (in this case, between 0.543 and 0.321, and between -0.124 and 0.452)
I know I have to use the Wald test (as it was used in some papers I looked at, that performed the same analysis), but I don't know how to implement it on R.
What's the R code to implement a Wald test of the difference between the coefficients of the model?


Solution

  • Here's an example using the Chile data from the carData package.

    data(Chile, package="carData")
    mod <- nnet::multinom(vote ~ sex + I(income/10000), data=Chile)
    #> # weights:  16 (9 variable)
    #> initial  value 3395.034890 
    #> iter  10 value 3105.730891
    #> final  value 3043.519706 
    #> converged
    summary(mod)
    #> Call:
    #> nnet::multinom(formula = vote ~ sex + I(income/10000), data = Chile)
    #> 
    #> Coefficients:
    #>   (Intercept)          sexM I(income/10000)
    #> N    1.212092  0.5676465119      0.02101608
    #> U    1.430206 -0.2235933346     -0.06761914
    #> Y    1.477171 -0.0003631522      0.02018734
    #> 
    #> Std. Errors:
    #>   (Intercept)      sexM I(income/10000)
    #> N   0.1326298 0.1655421      0.02108768
    #> U   0.1352691 0.1738994      0.02478598
    #> Y   0.1300724 0.1657089      0.02108419
    #> 
    #> Residual Deviance: 6087.039 
    #> AIC: 6105.039
    

    Let's say that in the model above, you wanted to compare the coefficients for sexM for the categories N and U. You could save the variance-covariance matrix of the estimators in v and the coefficients themselves in a vector called b.

    v <- vcov(mod)
    b <- c(t(coef(mod)))
    names(b) <- rownames(v)
    

    Then, you could create the relevant z-statistic by dividing the difference in coefficients by its standard error (the square root of the sum of the two variances minus two times the covariance of the relevant parameters):

    z_wt <- (b["N:sexM"] - b["U:sexM"])/sqrt(v["N:sexM", "N:sexM"] + v["U:sexM", "U:sexM"] - 2*v["N:sexM", "U:sexM"])
    

    Then, you could use pnorm() to calculate the p-value.

    2*pnorm(abs(z_wt), lower.tail=FALSE)
    #>       N:sexM 
    #> 1.255402e-12
    

    Another option would be to just change the reference category. For example if I wanted the U-N comparison, I could change the reference to either U or N and re-estimate the model:

    Chile$vote2 <- relevel(Chile$vote, "U")
    mod2 <- nnet::multinom(vote2 ~ sex + I(income/10000), data=Chile, maxit=1000)
    #> # weights:  16 (9 variable)
    #> initial  value 3395.034890 
    #> iter  10 value 3081.860210
    #> final  value 3043.519705 
    #> converged
    z_wt2 <- coef(mod2)["N", "sexM"]/summary(mod2)$standard.errors["N", "sexM"]
    
    2*pnorm(abs(z_wt2), lower.tail=FALSE)
    #> [1] 1.256475e-12
    

    Note that the two p-values are very close (though not exactly the same) as are the differences in the coefficients from the two methods, which are not printed.

    I wrote a package called factorplot which will calculate all of the pairwise differences for a single variable's coefficients in the MNL model:

    library(factorplot)
    f <- factorplot(mod, variable="sexM")
    print(f)
    #>       Difference    SE p.val
    #> A - N     -0.568 0.166 0.001
    #> A - U      0.224 0.174 0.199
    #> N - U      0.791 0.111 0.000
    #> A - Y      0.000 0.166 0.998
    #> N - Y      0.568 0.098 0.000
    #> U - Y     -0.223 0.112 0.046
    

    Created on 2022-09-29 by the reprex package (v2.0.1)

    This method is a bit easier and is based on the same calculations I proposed in the first method above.