rmatchbootresamplingbootstrapping

Bootstrap in pair within a matched sample


Hi I would like to compare outcomes between the treatment and control groups by bootstrapping the matched sample. This matched sample was obtained by using the genetic matching approach provided by MatchIt. According to this paper

Austin, P. C., & Small, D. S. (2014). The use of bootstrapping when using propensity‐score matching without replacement: a simulation study. Statistics in medicine, 33(24), 4306-4319.

I think I have to bootstrap the matched sample based on each matched pair, not each individual. However, I don't know how to re-sample by each matched pair.

Here I provide an example:

id <- c("A", "B", "C", "D", "E", "F")
treatment <- c(1, 0, 1, 0, 1, 0)
subclass <- c(1, 1, 2, 2, 3, 3)
outcome1 <- c(100, 300, 400, 500, 600, 700)
outcome2 <- c(200, 50, 600, 800, 900, 1000)

matched_sample <- data.frame(id, treatment, subclass, outcome1, outcome2)

> matched_sample
  id treatment subclass outcome1 outcome2
1  A         1        1      100      200
2  B         0        1      300       50
3  C         1        2      400      600
4  D         0        2      500      800
5  E         1        3      600      900
6  F         0        3      700     1000

Subclass indicates the matched pair. For example, individual A and B are a matched pair because they share the same subclass number. Whenever A appears in any sample, B should also appear in that sample.

After bootstrapping, I will run regression on outcome1 and outcome2 to estimate the average treatment effects (ATE), and also to obtain the 95% confidence intervals of the ATEs.

I think the package boot might be useful, but I'm not sure how to use it. I would be really grateful for your help on this.

EDIT: The ATEs that I would like to estimate are basically the coefficients of "treatment" in regressions. That is,

lm.ATE1 <- lm(outcome1 ~ treatment)
lm.ATE2 <- lm(outcome2 ~ treatment)

The idea is to bootstrap the matched sample for 10,000 times, estimate these two regressions within each bootstrapped samples, rank the resulting coefficients, and then find coefficients at the 2.5 and 97.5 percentile as the 95% confidence intervals for the ATE on outcome1 and outcome2 respectively.

Hopefully this clarifies. Thanks in advance.


Solution

  • The following function resamples from matched_sample R times, keeping matched pairs. Then it computes two regressions and extracts the coefficients the question names ATE*, returning a matrix 2xR. Finally, it uses apply to get the percentile 95% confidence intervals.

    id <- c("A", "B", "C", "D", "E", "F")
    treatment <- c(1, 0, 1, 0, 1, 0)
    subclass <- c(1, 1, 2, 2, 3, 3)
    outcome1 <- c(100, 300, 400, 500, 600, 700)
    outcome2 <- c(200, 50, 600, 800, 900, 1000)
    matched_sample <- data.frame(id, treatment, subclass, outcome1, outcome2)
    
    fun_boot <- function(data, R = 10000L) {
      f <- function() {
        b <- sample(uniq_sclass, n, TRUE)
        out <- sp[match(b, uniq_sclass)]
        out <- do.call(rbind, out)
        lm.ATE1 <- lm(outcome1 ~ treatment, out)
        lm.ATE2 <- lm(outcome2 ~ treatment, out)
        c(ATE1 = unname(coef(lm.ATE1))[2], 
          ATE2 = unname(coef(lm.ATE2))[2])
      }
      sp <- split(data, data$subclass)
      n <- length(sp)
      uniq_sclass <- names(sp)
      replicate(R, f())
    }
    
    set.seed(2022)
    
    # change this value to 10,000
    R <- 10L
    bootres <- fun_boot(matched_sample, R)
    t(apply(bootres, 1, quantile, probs = c(0.025, 0.975)))
    #>           2.5%      97.5%
    #> ATE1 -159.1667 -100.00000
    #> ATE2 -159.1667   47.91667
    

    Created on 2022-08-12 by the reprex package (v2.0.1)