rggplot2regressionglmstatistics-bootstrap

How to bootstrap glm regression, estimate 95% confidence interval and plot it?


I am doing glm regression with a 0-1 distribution dataset. It's going well with ggplot2::geom_smooth; here is my code:

library(ggplot2)
 
 set.seed(123)
 df <- transform(data.frame(Conc=runif(200, min=200, max=1000)),
 AE=rbinom(200, 1, prob=plogis((Conc - 600)/100)))
 
ggplot(df, aes(x = Conc, y = AE)) +   
  geom_jitter(height = 0.05, alpha = 0.5) +   
  geom_smooth(method = "glm", formula = y ~ log(x),
              method.args = list(family = "binomial"),
              color = "grey10")

enter image description here

Now, I want to plot the 95% confidence interval with bootstrap. I tried with boot package and ggplot2::mean_cl_boot, but all failed.

I didn't kept all the code that didn't work, but this is the latest code I tried. To be honest, I copied and tried these codes from other answers, but I do not fully understand these codes.

lm_coeffs = function(x, y) {
  coeffs = coefficients(lm(y~log(x)))
  tibble(C = coeffs[1], m=coeffs[2])
}

nboot = 1000

mtboot = lapply (seq_len(nboot), function(i) 
  df %>%
    slice_sample(prop=1, replace=TRUE) %>% 
    summarise(tibble(lm_coeffs(Conc, AE))))
mtboot = do.call(rbind, mtboot)

ggplot(df, aes(Conc, AE)) +
  geom_abline(aes(intercept=C, slope=m), data = mtboot,
              size=0.3, alpha=0.3, color='forestgreen') +
  geom_point() 

Solution

  • This is your model,

    > fit <- glm(AE ~ Conc, family='binomial', data=df)
    > ndf <- data.frame(Conc=seq(min(df$Conc), max(df$Conc), length.out=1e3))
    > pred <- predict(fit, newdata=ndf, type='link')
    

    that you want to bootstrap,

    > set.seed(42)
    > bf <- replicate(
    +   999L, {
    +     bdf <- df[sample.int(nrow(df), replace=TRUE), ]
    +     glm(AE ~ Conc, family='binomial', data=bdf)
    +   },
    +   simplify=FALSE
    + )
    

    and from the bootstrapped predictions,

    > bpred <- sapply(bf, predict, newdata=ndf, type='link')
    

    you want the 95% CI (showing percentile bootstrap method).

    > ci <- \(x, sd) x + as.matrix(sd*(-qt(.025, Inf))) %*% cbind(-1, 1)
    > bpredci <- ci(matrixStats::rowMeans2(bpred), matrixStats::rowSds(bpred))
    

    To plot it, you want to apply the fit$family$linkinv() function, as @GavinSimpson explained in this answer a while ago, like this:

    > plot(df)
    > lines(ndf$Conc, fit$family$linkinv(pred), col=4)
    > for (j in 1:2) lines(ndf$Conc, fit$family$linkinv(bpredci[, j]), col=4, lty=2)
    

    enter image description here

    Gavin's answer provides the ggplot2 way.

    PS: If I use sample data in OP, it looks similar to first plot there.


    Data:

    > set.seed(42)
    > df <- transform(
    +   data.frame(Conc=runif(200, min=200, max=1000)),
    +   AE=rbinom(200, 1, prob=plogis((Conc - 600)/100))
    + )