rmatchingpropensity-score-matching

Estimating Robust Standard Errors from Covariate Balanced Propensity Score Output


I'm using the Covariate Balancing Propensity Score (CBPS) package and I want to estimate robust standard errors for my ATT results that incorporate the weights. The MatchIt and twang tutorials both recommend using the survey package to incorporate weights into the estimate of robust standard errors, and it seems to work:

design.CBPS <- svydesign(ids=~1, weights=CBPS.object$weights, data=SUCCESS_All.01)
SE <- svyglm(dv ~ treatment, design = design.CBPS)

Additionally, the survey SEs are substantially different from the default lm() way of estimating coefficient and SE provided by the CBPS package. For those more familiar with either the CPBS or survey packages, is here any reason why this would be inappropriate or violate some assumption of the CBPS method? I don't see anything the CBPS documentation about how to best estimate standard error so that's why I'm slightly concerned.


Solution

  • Sandwich (robust) standard errors are the most commonly use standard errors after propensity score weighting (including CBPS). For the ATE, they are known to be conservative (too large), and for the ATT, they can be either too large or too small. For parametric methods like CBPS, it is possible to use M-estimation to account for both the estimation of the propensity scores and the outcome model, but this is fairly complicated, especially for specialized models like CBPS.

    The alternative is to use the bootstrap, where you bootstrap both the propensity score estimation and estimation of the treatment effect. The WeightIt documentation contains an example of how to do bootstrapping to estimate the confidence interval around a treatment effect estimate.

    Using the survey package is one way to get robust standard errors, but there are other packages you can use, such as the sandwich package as recommended in the MatchIt documentation. Under no circumstance should you use or even consider the usual lm() standard errors; these are completely inaccurate for inverse probability weights. The AsyVar() function in CBPS seems like it should provide valid standard errors, but in my experience these are also wildly inaccurate (compared to a bootstrap); the function doesn't even get the treatment effect right.

    I recommend you use a bootstrap. It may take some time (you ideally want around 1000 bootstrap replications), but these standard errors will be the most accurate.