Hi I would like to compare outcomes between the treatment and control groups by bootstrapping the matched sample. This matched sample was obtained by using the genetic matching approach provided by MatchIt
. According to this paper
Austin, P. C., & Small, D. S. (2014). The use of bootstrapping when using propensity‐score matching without replacement: a simulation study. Statistics in medicine, 33(24), 4306-4319.
I think I have to bootstrap the matched sample based on each matched pair, not each individual. However, I don't know how to re-sample by each matched pair.
Here I provide an example:
id <- c("A", "B", "C", "D", "E", "F")
treatment <- c(1, 0, 1, 0, 1, 0)
subclass <- c(1, 1, 2, 2, 3, 3)
outcome1 <- c(100, 300, 400, 500, 600, 700)
outcome2 <- c(200, 50, 600, 800, 900, 1000)
matched_sample <- data.frame(id, treatment, subclass, outcome1, outcome2)
> matched_sample
id treatment subclass outcome1 outcome2
1 A 1 1 100 200
2 B 0 1 300 50
3 C 1 2 400 600
4 D 0 2 500 800
5 E 1 3 600 900
6 F 0 3 700 1000
Subclass indicates the matched pair. For example, individual A and B are a matched pair because they share the same subclass number. Whenever A appears in any sample, B should also appear in that sample.
After bootstrapping, I will run regression on outcome1
and outcome2
to estimate the average treatment effects (ATE), and also to obtain the 95% confidence intervals of the ATEs.
I think the package boot
might be useful, but I'm not sure how to use it. I would be really grateful for your help on this.
EDIT: The ATEs that I would like to estimate are basically the coefficients of "treatment" in regressions. That is,
lm.ATE1 <- lm(outcome1 ~ treatment)
lm.ATE2 <- lm(outcome2 ~ treatment)
The idea is to bootstrap the matched sample for 10,000 times, estimate these two regressions within each bootstrapped samples, rank the resulting coefficients, and then find coefficients at the 2.5 and 97.5 percentile as the 95% confidence intervals for the ATE on outcome1
and outcome2
respectively.
Hopefully this clarifies. Thanks in advance.
The following function resamples from matched_sample
R
times, keeping matched pairs. Then it computes two regressions and extracts the coefficients the question names ATE*
, returning a matrix 2xR. Finally, it uses apply
to get the percentile 95% confidence intervals.
id <- c("A", "B", "C", "D", "E", "F")
treatment <- c(1, 0, 1, 0, 1, 0)
subclass <- c(1, 1, 2, 2, 3, 3)
outcome1 <- c(100, 300, 400, 500, 600, 700)
outcome2 <- c(200, 50, 600, 800, 900, 1000)
matched_sample <- data.frame(id, treatment, subclass, outcome1, outcome2)
fun_boot <- function(data, R = 10000L) {
f <- function() {
b <- sample(uniq_sclass, n, TRUE)
out <- sp[match(b, uniq_sclass)]
out <- do.call(rbind, out)
lm.ATE1 <- lm(outcome1 ~ treatment, out)
lm.ATE2 <- lm(outcome2 ~ treatment, out)
c(ATE1 = unname(coef(lm.ATE1))[2],
ATE2 = unname(coef(lm.ATE2))[2])
}
sp <- split(data, data$subclass)
n <- length(sp)
uniq_sclass <- names(sp)
replicate(R, f())
}
set.seed(2022)
# change this value to 10,000
R <- 10L
bootres <- fun_boot(matched_sample, R)
t(apply(bootres, 1, quantile, probs = c(0.025, 0.975)))
#> 2.5% 97.5%
#> ATE1 -159.1667 -100.00000
#> ATE2 -159.1667 47.91667
Created on 2022-08-12 by the reprex package (v2.0.1)