rdplyrnon-standard-evaluation

How to pass unquoted argument to filter() within user defined function


I am trying to write a function that allows a user to provide an unquoted argument to a function that will filter a dataset.

For simplicity i have used the iris dataset and mocked up a function.

filter_function <- function(var_to_filter_on){
  
  test_data <- iris %>%
    filter(Species %in% {{var_to_filter_on}})
  
  print(test_data)
  
}

This however returns:

Error in `filter()`:
ℹ In argument: `Species %in% virginica`.
Caused by error:
! object 'virginica' not found

I believe this is an issue with "non standard evaluation", but my understanding is not great. I thought that the {{}} operator would allow me to pass "virginica" without having to quote it in the function call?

I'm also using %in% because i want the user to have the flexibility to supply a vector of names to filter on.

How do i get this function to work?

Cheers


Solution

  • The {{ operator is meant to pass unquoted variable names to a function that uses tidy evaluation:

    The embrace operator {{ is used to create functions that call other data-masking functions. It transports a data-masked argument (an argument that can refer to columns of a data frame) from one function to another.

    When you call filter_function(virginica), the symbol virginica (not the literal string "virginica" is injected into the expression Species %in% {{var_to_filter_on}}, so you are effectively calling:

    test_data <- iris %>%
        filter(Species %in% virginica)
    

    as you can see from the error message. Since virginica is neither a variable in your environment nor a column in iris, you get an error.

    If you want to allow the user to provide the species epithet as symbol rather than a string, you'd need to capture the argument and then convert it to a string:

    filter_function <- function(var_to_filter_on){
    
      test_data <- iris %>%
        filter(Species %in% rlang::as_string(rlang::enexpr(var_to_filter_on)))
      
      print(test_data)
      
    }
    

    If you want to be able to supply multiple using c(), then you'd have to extract the arguments from the call object using as.list(rlang::enexpr(var_to_filter_on))[-1], or pass them in as dotted arguments (...) and use rlang::enexprs().

    But, I think it makes more sense to use @Edward's suggestion and simply pass in a string. Unquoted symbols refer to variables, while the species epithet is data, which is represented with quoted strings. Mixing the two creates a non-idiomatic interface for your users.