I have a data frame containing the specification for a set of regression models (regress_grid) with a column for different aspects of the model. I then use dplyr::rowwise()
to estimate a model for each row of regress_grid using the analysis dataset (mtcars). This is adapted from the dataless grids approach in Tim Tiefenbach's blog post.
A minimal example of what i'm attempting is below:
library("tibble")
library("rlang")
library("dplyr")
# Regression specification for 2 models with different explanatory variables are samples, specified based on variables in the analysis dataset.)including columns with filter expressions (strat1 and strat2)
regress_grid = tribble(
~strat1, ~strat2, ~term_labels,
expr(carb != 1), expr(cyl != 4), c("wt","qsec") ,
expr(carb != 1), TRUE, c("wt") )
regress_grid
# Use rowwise to add a mod column containing the lm object.
regress_grid1 = regress_grid |>
dplyr::rowwise() |>
dplyr::mutate(mod = list(lm(stats::reformulate(termlabels = term_labels,
response = "mpg"),
data = filter(mtcars,
eval(strat1),
eval(strat2)))))
I don't seem able to make this work when I wrap the code in a function and try and make it flexible enough to take an unknown number of columns of regress_grid containing filter expressions. I want to specify the columns in regress_grid containing the filter expressions as a list and use list(TRUE)
as the default as a way to not filter any observations. The default successfully runs with no filtering, but if I add a list of columns from regress_grid containing filter expressions (strat1 and strat2) I don't seem able to find a way to make this work.
# Function
regress_func = function(reg_grid, termlabels, data, filters = list(TRUE)){
reg_grid = reg_grid |>
dplyr::rowwise() |>
dplyr::mutate(mod = list(rlang::inject(lm(stats::reformulate(termlabels = {{termlabels}},
response = "mpg"),
data = filter(data, !!!filters)))))
return(reg_grid)
}
# This works when no filter expressions are specifed and the default list(TRUE) in used.
regress_grid2 = regress_func(reg_grid = regress_grid,
termlabels = term_labels,
data = mtcars)
# I can't work out how to specify the options for filtering within a function
regress_grid2 = regress_func(reg_grid = regress_grid,
termlabels = term_labels,
filters = list(strat1, strat2),
data = mtcars)
In its current form this results in an Error: object 'strat1' not found
.
I have tried various combinations of eval
, expr
, enexpr
and enquo
but I seem to go round in circles.
I have attempted to digest the rlang and meta-programming documentation, but I don't able to have wrap my head around it.
Ideally, I wouldn't use ...
here as I was planning to use these for something else in my real use case.
I don't feel strongly whether I specify the columns containing filter expressions as list(strat1, strat2)
or list("strat1", "strat2")
.
I have attempted to make my question specific but if there are other ways to approach this I would be very interested.
I have found several previous questions about dynamically specifying filter arguments here, here, here, and here but none quite answer my question.
You kind of need double injection. Here's a helper function that can turn call to a list list like list(a,b)
into list(!!a, !!b)
so you can inject those expressions into the list itself.
splicelist <- function(x) {
stopifnot(rlang::quo_get_expr(x)[[1]] == as.name("list"))
Map(function(x) bquote(!!.(x)), as.list(rlang::quo_get_expr(x))[-1])
}
and then update your regression function to get the unevaulated filters and the process them and inject them
regress_func = function(reg_grid, termlabels, data, filters = list(TRUE)){
filters <- splicelist(rlang::enquo(filters))
reg_grid = reg_grid |>
dplyr::rowwise() |>
dplyr::mutate(mod = list(rlang::inject(lm(stats::reformulate(termlabels = {{termlabels}},
response = "mpg"),
data = filter(data, !!!filters)))))
return(reg_grid)
}
Which should then work with both of your examples