rpurrr

# How to generate residual plots when running a series of regressions with purrr::map?

I am running a long series of regression models using `purrr` and `broom`. I'd like to generate plots of the residuals for each of them, and ideally save them in a folder somewhere.

My code for the regressions (in simplified form) looks like this:

``````library(tidyverse)

set.seed(1)
df <- data.frame(x1 = rnorm(100, 0, 1),
x2 = rnorm(100, 0, 1),
y1 = rnorm(100, 0, 1),
y2 = rnorm(100, 0, 1),
t2 = sample(0:1, 100, replace = TRUE),
t1 = sample(0:1, 100, replace = TRUE))

models <- data.frame(outcome = c("y1", "y2", "y2"),
treatment = c("t1", "t1", "t2"),
covariates = c("x1", "x1", "x1 + x2"))

regression_output <- models %>%
mutate(formula = paste(outcome, "~", treatment, "+", covariates),
fit = map(formula, ~ tidy(lm(.x, df)))) %>%
unnest(fit) %>%
filter(term == treatment)
``````

I've written a simple function to save a QQ plot:

``````save_residual_plot <- function(formula, fit) {
jpeg(paste0("QQ plot - ", formula, ".jpg"))
plot(fit, which = 2)
dev.off()
}
``````

But I have not found a way to run this within the nested dataframe/map framework. Is it possible to do this? Or is there a better way to check the residuals when running a whole series of regression models?

Solution

• The following works. Create the formulas, keep them in a vector `fmla_vec`, then fit the models and call the plotting function.

``````suppressPackageStartupMessages({
library(tidyverse)
})

set.seed(1)
df <- data.frame(x1 = rnorm(100, 0, 1),
x2 = rnorm(100, 0, 1),
y1 = rnorm(100, 0, 1),
y2 = rnorm(100, 0, 1),
t2 = sample(0:1, 100, replace = TRUE),
t1 = sample(0:1, 100, replace = TRUE))

models <- data.frame(outcome = c("y1", "y2", "y2"),
treatment = c("t1", "t1", "t2"),
covariates = c("x1", "x1", "x1 + x2"))

save_residual_plot <- function(formula, fit) {
jpeg(paste0("QQ plot - ", formula, ".jpg"))
plot(fit, which = 2)
dev.off()
}

fmla_vec <- models %>%
mutate(formula = paste(outcome, "~", treatment, "+", covariates)) %>%
pull(formula)

fmla_vec %>%
map(lm, data = df) %>%
mapply(save_residual_plot, fmla_vec, fit = .)
#>      y1 ~ t1 + x1.png      y2 ~ t1 + x1.png y2 ~ t2 + x1 + x2.png
#>                     2                     2                     2
``````

Created on 2023-10-01 with reprex v2.0.2