rstandard-errorrobust

Cannot calculate robust standard errors (vcovHC): multicollinearity and NaN error


I have the following dataset:

share <- c(-0.16,-0.07,0,0,-0.06,-0.06,-0.18,-0.23,-0.07,-0.24,0,0,-0.22,-0.15,0,0,-0.09,-0.2,0,-0.19,0,-0.16,-0.24,-0.14,-0.22,-0.22,0,-0.18,0,0,-0.01,0,0,-0.14,0,-0.06,0,-0.12,0,0,-0.14,0,0,0,-0.02,0,0,0,0,0,-0.19,0,-0.21,-0.08,0,0,0,-0.1,-0.17,0,0,-0.13,-0.08,-0.1,0,-0.05,-0.06,0,0,-0.1,0,0,0,-0.16,0,-0.18,-0.04,-0.08,0,-0.06,0,0,0,-0.2,0,0,-0.08,0,0,-0.01,0,-0.16,-0.08,0,0,0,-0.02,-0.18,0.17,-0.2,-0.14,0,-0.24)
grade <- c(-22.64,-2.39,-15.3,-22.34,-6.39,5.25,0.47,33.61,-0.54,-25.21,16.41,-13.94,21.04,-19.64,-7.48,-32.18,-19.48,21.65,0.88,-1.6,28.49,13.61,-1.69,19.34,2.19,-7.25,6.65,7.25,20.56,16.25,17.24,12.3,3.7,-8.64,8.25,3.27,6,14.32,0.91,8.6,-6.84,13.35,6.99,3.36,0.29,19.46,2,10.75,8.76,-6.54,36.46,28.88,0.44,4.73,3.52,-10.7,-3.88,-2.36,-15.12,9.55,-4.96,10.7,6.84,-10.98,13.03,3.46,14.65,-1.66,6.89,-11.47,-6.52,15.64,3.02,4.11,-8.41,4.77,18.97,-3.73,9.54,-6.06,13.1,-24.62,-2.5,-30.26,3.82,12.97,4,2.21,14.33,12.47,-55.11,19.86,13.7,9.19,-11.01,-2.52,13.1,3.83,1.51,4.18,-3.14,-5.85,8.39)
treat <- c(0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0)
dataset <- as.data.frame(cbind(share, grade, treat))

I want to run the following linear model and calculate robust standard errors by using the sandwich package.

library(sandwich)
model <- lm(grade ~ treat + share + treat*share, dataset)
robSE <- sqrt(diag(vcovHC(model, type = "HC2")))

The output of the linear model is fine, but the robust standard errors are not calculated: NaN is returned for each standard error.

My guess is that the NaN is due to the variables treat and share being almost perfectly collinear (there is only one observation in which treat=1 and share!=0). The problem is that I must use these variables; I cannot replace them.

Does someone can think of a workaround/solution to this issue?


Solution

  • After doing some reading and following @DominicComtois suggestion, I think here I will have to go with HC1. HC3, HC4 (Long and Ervin 2000, Hayes and Cai 2007), or HC4m (Cribari-Neto and Silva 2011) would be better, but they all give NaN results. The problem with HC0 is that it tends to be biased in small to moderately large samples (Hayes and Cai 2007, Cribari-Neto and Silva 2011).

    References

    Cribari-Neto F., da Silva W.B. (2011). A new heteroskedasticity-consistent covariance matrix estimator for the linear regression model. Adv Stat Anal 95:129–146.

    Hayes, A. F., & Cai, L. (2007). Using heteroscedasticity-consis- tent standard error estimators in OLS regression: An intro- duction and software implementation. Behavior Research Methods, 39, 709–722.

    Long, J.S., Ervin, L.H. (2000). Using heteroscedasticity-consistent standard errors in the linear regression model. Am. Stat. 54, 217–224.