rrlangtidyevalnse

Is there a better way to use NSE in a function to concatenate dataframe columns?


Background

I'm trying to get my grip on meta-programming methods in Advanced R, and not being a programmer by background, it is taking some effort. I am trying to write functions to manipulate dataframe columns without quoting (tidyverse style). This is easy enough when actually using dplyr verbs by using enquo() and !! or curly-curly {{ }}, but less intuitive when I want to do something inside a function akin to var <- c(df$colA, df$colB) which doesn't require a tidyverse verb.

Setup

library(tidyverse)
library(rlang)

df <- tibble(col1 = c("A", "B", "C"),
             col2 = c("D", "E", "F"),
             col3 = c("G", "H", "I"),
             col4 = c("J", "K", "L"))
> df
# A tibble: 3 × 4
  col1  col2  col3  col4 
  <chr> <chr> <chr> <chr>
1 A     D     G     J    
2 B     E     H     K    
3 C     F     I     L    

The Function

Most of the time, the dataframe(s) I'll be using have the same column names, so setting them as defaults will save coding. This works, but seems excessively verbose for all the use of enquo(), expr(), !!, and eval_tidy() in this small example:

myFun3 <- function(df, var1 = col1, var2 = col2){
  var1 <- enquo(var1)
  var2 <- enquo(var2)

  var3 <- expr(c(!!var1, !!var2))
  #print(var3)
  out <- tibble(var_out = eval_tidy(var3, df))
  return(out)
}

> myFun3(df)
# A tibble: 6 × 1
  var_out
  <chr>  
1 A      
2 B      
3 C      
4 D      
5 E      
6 F   
   
> myFun3(df, col3, col4)  # For when I have column names that aren't my defaults
# A tibble: 6 × 1
  var_out
  <chr>  
1 G      
2 H      
3 I      
4 J      
5 K      
6 L      

If I throw a print(var3) into the function and rerun it, I can see the expression is c(~col1, ~col2), and I initially thought I could shorten the function like this:

myFun4 <- function(df, var1 = col1, var2 = col2){
  var3 <- expr(c(ensym(var1), ensym(var2)))   # ditched the enquo() and tried directly inserting parameter values with ensym()
  print(var3)
  out <- tibble(var_out = eval_tidy(var3, df))
  return(out)
}

> myFun4(df)
c(ensym(var1), ensym(var2))
# A tibble: 2 × 1
  var_out
  <list> 
1 <sym>  
2 <sym>  

As you can see above, my failure above is that symbols never get evaluated as they are preserved by the expr(). I was closer before when I kept var1 and var2 as quosures.

Guidance Sought

Can myFun3() be written in a more concise manner than I have done? I'm focused on reading how this works in rlang, but the problem I'm showing above is fundamentally base R (concatenating two columns of a dataframe). In other circumstances, I am writing functions using dplyr verbs, so I am thinking that staying with tidy evaluation is appropriate here, but maybe I should be doing this above with base R NSE (? -- Would that make a difference?). Thank you for any clarity on my efforts above.


Solution

  • Here are a couple of options to get the ball rolling. You could turn the unquoted variable names into quoted variable names and then us df[[var1]] or something like that. That's what I do in myFun3() below.

    library(tidyverse)
    library(rlang)
    #> 
    #> Attaching package: 'rlang'
    #> The following objects are masked from 'package:purrr':
    #> 
    #>     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
    #>     flatten_raw, invoke, splice
    
    df <- tibble(col1 = c("A", "B", "C"),
                 col2 = c("D", "E", "F"),
                 col3 = c("G", "H", "I"),
                 col4 = c("J", "K", "L"))
    
    
    myFun3 <- function(df, var1 = col1, var2 = col2){
      var1 <- as_label(enquo(var1))
      var2 <- as_label(enquo(var2))
      out <- tibble(var_out = c(df[[var1]], df[[var2]]))
      return(out)
    }
    
    myFun3(df)
    #> # A tibble: 6 × 1
    #>   var_out
    #>   <chr>  
    #> 1 A      
    #> 2 B      
    #> 3 C      
    #> 4 D      
    #> 5 E      
    #> 6 F
    

    You also don't have to leave the dplyr world, because for this particular problem, you could use reframe() to make the new dataset. Whether this works as well your intended real-world scenario, I'm not sure. That's what I do in myFun3b() below:

    myFun3b <- function(df, var1 = col1, var2 = col2){
      out <- df %>% reframe(var_out = c({{var1}}, {{var2}}))
      return(out)
    }
    
    myFun3b(df)
    #> # A tibble: 6 × 1
    #>   var_out
    #>   <chr>  
    #> 1 A      
    #> 2 B      
    #> 3 C      
    #> 4 D      
    #> 5 E      
    #> 6 F
    

    You could also do it by using the inject operator (!!) on an ensym() using with(df, ...) as in myFun3c().

    myFun3c <- function(df, var1 = col1, var2 = col2){
      out <- tibble(var_out = with(df, c(!!ensym(var1), !!ensym(var2))))
      return(out)
    }
    myFun3c(df)
    #> # A tibble: 6 × 1
    #>   var_out
    #>   <chr>  
    #> 1 A      
    #> 2 B      
    #> 3 C      
    #> 4 D      
    #> 5 E      
    #> 6 F
    

    Created on 2023-08-16 with reprex v2.0.2