rpurrr

How to generate residual plots when running a series of regressions with purrr::map?


I am running a long series of regression models using purrr and broom. I'd like to generate plots of the residuals for each of them, and ideally save them in a folder somewhere.

My code for the regressions (in simplified form) looks like this:

library(tidyverse)

set.seed(1)
df <- data.frame(x1 = rnorm(100, 0, 1), 
                 x2 = rnorm(100, 0, 1),
                 y1 = rnorm(100, 0, 1), 
                 y2 = rnorm(100, 0, 1), 
                 t2 = sample(0:1, 100, replace = TRUE), 
                 t1 = sample(0:1, 100, replace = TRUE))

models <- data.frame(outcome = c("y1", "y2", "y2"),
                     treatment = c("t1", "t1", "t2"),
                     covariates = c("x1", "x1", "x1 + x2"))

regression_output <- models %>% 
  mutate(formula = paste(outcome, "~", treatment, "+", covariates), 
         fit = map(formula, ~ tidy(lm(.x, df)))) %>% 
  unnest(fit) %>% 
  filter(term == treatment)

I've written a simple function to save a QQ plot:

save_residual_plot <- function(formula, fit) {
  jpeg(paste0("QQ plot - ", formula, ".jpg"))
  plot(fit, which = 2)
  dev.off()
}

But I have not found a way to run this within the nested dataframe/map framework. Is it possible to do this? Or is there a better way to check the residuals when running a whole series of regression models?


Solution

  • The following works. Create the formulas, keep them in a vector fmla_vec, then fit the models and call the plotting function.

    suppressPackageStartupMessages({
      library(tidyverse)
    })
    
    set.seed(1)
    df <- data.frame(x1 = rnorm(100, 0, 1), 
                     x2 = rnorm(100, 0, 1),
                     y1 = rnorm(100, 0, 1), 
                     y2 = rnorm(100, 0, 1), 
                     t2 = sample(0:1, 100, replace = TRUE), 
                     t1 = sample(0:1, 100, replace = TRUE))
    
    models <- data.frame(outcome = c("y1", "y2", "y2"),
                         treatment = c("t1", "t1", "t2"),
                         covariates = c("x1", "x1", "x1 + x2"))
    
    save_residual_plot <- function(formula, fit) {
      jpeg(paste0("QQ plot - ", formula, ".jpg"))
      plot(fit, which = 2)
      dev.off()
    }
    
    fmla_vec <- models %>% 
      mutate(formula = paste(outcome, "~", treatment, "+", covariates)) %>%
      pull(formula) 
    
    fmla_vec %>%
      map(lm, data = df) %>%
      mapply(save_residual_plot, fmla_vec, fit = .)
    #>      y1 ~ t1 + x1.png      y2 ~ t1 + x1.png y2 ~ t2 + x1 + x2.png 
    #>                     2                     2                     2
    

    Created on 2023-10-01 with reprex v2.0.2