EDIT: I would like help closing my question. This question was originally posted on Cross Validated, should have stayed there because the answer is more of a statistics question and not a programming one, but it got closed over there and was asked to repost it over here. I did that, but the answer is really a stats question so I'll post the answer here. If someone would be willing to close it I would be appreciative.
I have been trying to do some clustered bootstrapping using lm that I can then use in the package sensemakr(). If you're not familiar with it, it's an implemention of some ideas in Cinelli & Hazlett (2020, in the Journal of the Royal Statistical Society: Statistical Methodology, Series B).
The way sensemakr() works is that you run a regression model that produces either an lm or feols-class object, which you can then use in sensemakr().
library(palmerpenguins)
library(sensemakr)
penguin_dat<-penguins
model_lm<-lm(flipper_length_mm ~ sex + body_mass_g, data=penguin_dat)
summary(model_lm)
sensitivity1<-sensemakr(model=model_lm, treatment="body_mass_g", benchmark_covariates="sexmale")
summary(sensitivity1)
I need to find a way to pass models with more complicated error structures to sensemakr, specifically a model with a clustered bootstrap. It currently takes lm models, but I read that you can try and pass feols models from fixest to it if you're willing to use the development version (see: Use sensemakr with fixest feols model (R)). I haven't tried it myself yet, however.
Here are the options I have considered:
library(palmerpenguins)
library(sandwich)
library(sensemakr)
library(lmtest)
penguin_dat<-penguins
model_lm<-lm(flipper_length_mm ~ sex + body_mass_g, data=penguin_dat)
clustered_bootstrap_model_lm<-coeftest(model_lm, vcov = vcovBS, cluster = penguin_dat$species, R = 1000)
The irony of this is that the authors of sensemakr() have also implemented this in Stata, and it's very simple to do a clustered bootstrap and then use the standard errors in another command.
sysuse auto2.dta
regress price mpg weight rep78, vce(bootstrap, reps(100)) cluster(foreign)
est sto m1
But the Stata implementation doesn't allow you to pass the model through, instead making you run the regression within sensemakr itself...and when you do that, it disallows more complicated error structures.
Does anyone have any ideas on how to do this? I greatly appreciate any advice you can give.
Citations: Cinelli, C., & Hazlett, C. (2020). Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1), 39-67. https://doi.org/10.1111/rssb.12348
Cinelli, C., Ferwerda, J., & Hazlett, C. (2020). sensemakr: Sensitivity analysis tools for OLS in R and Stata. Available at SSRN: https://ssrn.com/abstract=3588978 or http://dx.doi.org/10.2139/ssrn.3588978
I talked with the main author of the program, and the previous answer to this one is technically correct but not what I needed.
The true answer is a statistics question rather than a programming one, and as such should not have been closed over on Cross Validated. You can modify the object but there's really no point, since sensemakr() relies on point estimates for almost all of its calculations and not the standard errors. The big exceptions are (1) the robustness values for testing the null hypothesis and (2) the standard error / confidence interval / test statistics for the adjusted estimate.
For those two, the correct answer is to bootstrap sensemakr itself and extract the following estimates into a vector: rv_q (the lower bound of which will give you the bootstrapped equivalent of rv_qa) and adjusted_estimate.
Much of this code was provided to me by Carlos Cinelli (the lead author on the sensemakr package), and I am very grateful for it.
Using the darfur dataset as an example, first create a vector:
adjusted_estimate_boot <- rep(NA, B) # vector to store results
Then resample the dataset a lot of times, each time extracting the adjusted estimates and adding it to the vector.
B <- 5e2; # number of bootstrap samples
<- nrow(darfur) # number of observations in the full data
# boostrap loop
for(i in 1:B){
cat(i,"out of", B, "\n")
# resample data
idx_boot <- sample(1:n, size = n, replace = T)
dat_boot <- darfur[idx_boot, ]
# fit model
my.ols_boot <- lm(peacefactor ~ directlyharmed + age + farmer_dar + herder_dar +
pastvoted + hhsize_darfur + female + village, data = dat_boot)
# sensemakr
sense.out_boot <- sensemakr(my.ols_boot, treatment = "directlyharmed",
benchmark_covariates = "female",
kd = 1)
# save the estimate you want
adjusted_estimate_boot[i] <- sense.out_boot$bounds[1,"adjusted_estimate"]
}
And then calculate out the new confidence interval:
sig.level <- 0.05
quantile(adjusted_estimate_boot, c(sig.level/2, 1-sig.level/2))
You can do the same thing to extract rv_q, and the lower bound of rv_q in that last part is equivalent to a bootstrapped rv_qa.