rpredictionlfe

Predict using felm output with standard errors


Is there way to get predict behavior with standard errors from lfe::felm if the fixed effects are swept out using the projection method in felm? This question is very similar to the question here, but none of the answers to that question can be used to estimate standard errors or confidence/prediction intervals. I know that there's currently no predict.felm, but I am wondering if there are workarounds similar to those linked above that might also work for estimating the prediction interval

library(DAAG)
library(lfe)

model1 <- lm(data = cps1, re74 ~ age + nodeg + marr)
predict(model1, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T, interval="prediction")$fit
# Result:        fit      lwr      upr
# 1 18436.18 2339.335 34533.03

model2 <- felm(data = cps1, re74 ~ age | nodeg + marr)
predict(model2, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T, interval="prediction")$fit
# Does not work

The goal is to estimate a prediction interval for yhat, for which I think I'd need to compute the full variance-covariance matrix (including the fixed effects). I haven't been able to figure out how to do this, and I'm wondering if it's even computationally feasible.


Solution

  • After conversations with several people, I don't believe it is possible to obtain an estimate the distribution of yhat=Xb (where X includes both the covariates and the fixed effects) directly from felm, which is what this question boils down to. It is possible bootstrap them, however. The following code does so in parallel. There is scope for performance improvements, but this gives the general idea.

    Note: here I do not compute full prediction interval, just the SEs on Xb, but obtaining the prediction interval is straightforward - just add the root of sigma^2 to the SE.

    library(DAAG)
    library(lfe)
    library(parallel)
    
    model1 <- lm(data = cps1, re74 ~ age + nodeg + marr)
    yhat_lm <- predict(model1, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T)
    
    set.seed(42)
    boot_yhat <- function(b) {
      print(b)
      n <- nrow(cps1)
      boot <- cps1[sample(1:n, n, replace=T),]
    
      lm.model <- lm(data=demeanlist(boot[, c("re74", "age")], list(factor(boot$nodeg), factor(boot$marr))), 
                     formula = re74 ~ age)
      fe <- getfe(felm(data = boot, re74 ~ age | nodeg + marr))
    
      bootResult <- predict(lm.model, newdata = data.frame(age = 40)) + 
        fe$effect[fe$fe == "nodeg" & fe$idx==0] + 
        fe$effect[fe$fe == "marr" & fe$idx==1]
      return(bootResult)
    }
    
    B = 1000
    yhats_boot <- mclapply(1:B, boot_yhat)
    
    plot(density(rnorm(10000, mean=yhat_lm$fit, sd=yhat_lm$se.fit)))
    lines(density(yhats), col="red")