rcluster-analysislm

Comparison of coefficient test across subsamples


I am trying to conduct a comparison of coefficient tests across two subsamples. To achieve this, I do the following:

full_model <- lm(y ~ v1*subsample_dummy + fixed_effects, data=df)
reduced_model <- lm(y ~ v1 + subsample_dummy + fixed_effects, data=df)

test <- anova(full_model, reduced_model)

The above gives me the result.

However, I am not sure how to do the same in the situation where I have to cluster the models by, let's say, the year variable.

I can cluster the lm models using the following code:

library(sandwich)
# cluster by year
clustered_se <- vcovCL(full_model, ~ year) 
clustered_se1 <- vcovCL(reduced_model, ~ year)

# generate summaries with clustered standard errors 
a <- coeftest(full_model, vcov. = clustered_se) 
b <- coeftest(reduced_model, vcov. = clustered_se1) 

However, the issue remains, as I still cannot do:

anova(a, b)

How to achieve the comparison of coefficient test across subsamples when the model requires standard error clustering?


Solution

  • We can use sandwich::vcovCL to get essentially the same standard errors like lfe::felm. Let's estimate some models.

    > est1 <- lfe::felm(y ~ x1 + x2 | id + firm | 0 | firm, data=d)  ## id + firm FE, clustered by firm
    > est2 <- lfe::felm(y ~ x1 | id + firm | 0 | firm, data=d)  ## same, restricted model
    > est3 <- lm(y ~ x1 + x2 + as.factor(id) + as.factor(firm), data=d)  ## as est1 w/o clustering
    > est4 <- lm(y ~ x1 + as.factor(id) + as.factor(firm), data=d)  ## as restricted est3
    

    Comparing the standard errors of est3 with est1,

    > lmtest::coeftest(est3, vcov.=\(x) 
    +                  sandwich::vcovCL(x, cluster=d$firm, type='HC0'))[1:3, ]
                 Estimate Std. Error  t value      Pr(>|t|)
    (Intercept) 3.7416642 0.10554929 35.44945 5.454680e-177
    x1          1.0432612 0.02965723 35.17730 3.631800e-175
    x2          0.4904104 0.03186679 15.38939  5.822822e-48
    > coef(summary(est1))
        Estimate Cluster s.e.  t value     Pr(>|t|)
    x1 1.0432612   0.02968696 35.14207 1.799173e-13
    x2 0.4904104   0.03189874 15.37398 2.931632e-09
    

    yields essentially the same.

    Thus, and according to a post on Cross validated [1] we could compare models est1 and est2 using an lmtest::waldtest (there's also a lfe::waldtest which works differently).

    > lmtest::waldtest(est3, est4, vcov=\(x) 
    +                  sandwich::vcovCL(x, cluster=d$firm, type='HC0'))
    Wald test
    
    Model 1: y ~ x1 + x2 + as.factor(id) + as.factor(firm)
    Model 2: y ~ x1 + as.factor(id) + as.factor(firm)
      Res.Df Df      F    Pr(>F)    
    1    966                        
    2    967 -1 236.83 < 2.2e-16 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    

    Hope this brings you a step further.

    BTW: You definitely could file a feature request on this as an issue on the author's GitHub.


    Data:

    > set.seed(42)
    > n <- 1e3
    > d <- data.frame(x1=rnorm(n), x2=rnorm(n), id=factor(sample(20, n, replace=TRUE)),
    +                 firm=factor(sample(13, n, replace=TRUE)), u=rnorm(n))
    > id.eff <- rnorm(nlevels(d$id))
    > firm.eff <- rnorm(nlevels(d$firm))
    > d$y <- d$x1 + 0.5 * d$x2 + id.eff[d$id] + firm.eff[d$firm] + d$u