rdplyrtidyeval

Programmatically filter using expressions stored in another data frame


I have a data frame containing the specification for a set of regression models (regress_grid) with a column for different aspects of the model. I then use dplyr::rowwise() to estimate a model for each row of regress_grid using the analysis dataset (mtcars). This is adapted from the dataless grids approach in Tim Tiefenbach's blog post.

A minimal example of what i'm attempting is below:

library("tibble")
library("rlang")
library("dplyr")

# Regression specification for 2 models with different explanatory variables are samples, specified based on variables in the analysis dataset.)including columns with filter expressions (strat1 and strat2) 
regress_grid = tribble(
  ~strat1,          ~strat2,        ~term_labels,
  expr(carb != 1), expr(cyl != 4), c("wt","qsec") ,
  expr(carb != 1), TRUE,           c("wt") )
regress_grid


# Use rowwise to add a mod column containing the lm object.
regress_grid1 = regress_grid |>
  dplyr::rowwise() |>
  dplyr::mutate(mod = list(lm(stats::reformulate(termlabels = term_labels,
                                                               response = "mpg"),
                                            data = filter(mtcars, 
                                                          eval(strat1), 
                                                          eval(strat2)))))

I don't seem able to make this work when I wrap the code in a function and try and make it flexible enough to take an unknown number of columns of regress_grid containing filter expressions. I want to specify the columns in regress_grid containing the filter expressions as a list and use list(TRUE) as the default as a way to not filter any observations. The default successfully runs with no filtering, but if I add a list of columns from regress_grid containing filter expressions (strat1 and strat2) I don't seem able to find a way to make this work.


# Function 
regress_func = function(reg_grid, termlabels, data, filters = list(TRUE)){

  reg_grid = reg_grid |>
    dplyr::rowwise() |>
    dplyr::mutate(mod = list(rlang::inject(lm(stats::reformulate(termlabels = {{termlabels}},
                                                                     response = "mpg"),
                                                  data = filter(data, !!!filters)))))
  return(reg_grid)
}

# This works when no filter expressions are specifed and the default list(TRUE) in used.
regress_grid2 = regress_func(reg_grid = regress_grid,
                  termlabels = term_labels,
                  data = mtcars)

# I can't work out how to specify the options for filtering within a function                  
regress_grid2 = regress_func(reg_grid = regress_grid,
                  termlabels = term_labels,
                  filters = list(strat1, strat2),
                  data = mtcars) 

In its current form this results in an Error: object 'strat1' not found. I have tried various combinations of eval, expr, enexpr and enquo but I seem to go round in circles. I have attempted to digest the rlang and meta-programming documentation, but I don't able to have wrap my head around it.

Ideally, I wouldn't use ... here as I was planning to use these for something else in my real use case. I don't feel strongly whether I specify the columns containing filter expressions as list(strat1, strat2) or list("strat1", "strat2"). I have attempted to make my question specific but if there are other ways to approach this I would be very interested.

I have found several previous questions about dynamically specifying filter arguments here, here, here, and here but none quite answer my question.


Solution

  • You kind of need double injection. Here's a helper function that can turn call to a list list like list(a,b) into list(!!a, !!b) so you can inject those expressions into the list itself.

     splicelist <- function(x) {
      stopifnot(rlang::quo_get_expr(x)[[1]] == as.name("list"))
      Map(function(x) bquote(!!.(x)), as.list(rlang::quo_get_expr(x))[-1])
    }
    

    and then update your regression function to get the unevaulated filters and the process them and inject them

    regress_func = function(reg_grid, termlabels, data, filters = list(TRUE)){
      filters <- splicelist(rlang::enquo(filters))
      reg_grid = reg_grid |>
        dplyr::rowwise() |>
        dplyr::mutate(mod = list(rlang::inject(lm(stats::reformulate(termlabels = {{termlabels}},
                                                                     response = "mpg"),
                                                  data = filter(data, !!!filters)))))
      return(reg_grid)
    }
    

    Which should then work with both of your examples