rdataframefunctiondplyrr-glue

refer to quoted column name in a function in R


I want to use the na_omit function from the collapse package in a user-defined function. na_omit requires a column name to be in quotes as one of its arguments. If I didn't need the column name in quotes, I could just refer to the column name in double braces, {{col}}, as mentioned in this vignette, "Programming with dplyr". If I refer to the column using the glue package, such as glue::glue("{col}"), I receive errors.

Here is a reprex:

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

library(collapse)
library(dplyr)
library(glue)

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, color_code)

The expected output can be generated with the following:

my_df %>% 
  collapse::na_omit(cols = c("color_code")) 

and should produce:

#  color_code  color
#1        V9G   Blue
#2        J4C  White
#3        F7B Orange
#4        G3V  Green

How should I refer to a quoted column name that's a parameter and an argument of a function within a user-defined function in R?


Solution

  • In general, collapse is mostly standard evaluation and its NSE features are based upon base R, so most of the rlang, glue stuff, {{ }}, etc. won't work, but you will have simpler and faster code. For base R NSE functional programming, see http://adv-r.had.co.nz/Computing-on-the-language.html.

    As suggested by r2evans, for a single column, a solution would be:

    my_func <- function(df, col) { 
      col_char_ref <- as.character(substitute(col))
      df %>% 
        collapse::na_omit(cols = col_char_ref)
    }
    

    i.e. use substitute() to capture the expression and as.character or all.vars to extract the variables. For multiple columns a general solution is wrapping fselect e.g.

    library(collapse)
    my_func <- function(df, ...) {
      cols <- fselect(df, ..., return = "indices")
      na_omit(df, cols = cols) 
    }
    
    my_func(wlddev, PCGDP:GINI, POP) |> head()
    #>   country iso3c       date year decade                region
    #> 1 Albania   ALB 1997-01-01 1996   1990 Europe & Central Asia
    #> 2 Albania   ALB 2003-01-01 2002   2000 Europe & Central Asia
    #> 3 Albania   ALB 2006-01-01 2005   2000 Europe & Central Asia
    #> 4 Albania   ALB 2009-01-01 2008   2000 Europe & Central Asia
    #> 5 Albania   ALB 2013-01-01 2012   2010 Europe & Central Asia
    #> 6 Albania   ALB 2015-01-01 2014   2010 Europe & Central Asia
    #>                income  OECD    PCGDP LIFEEX GINI       ODA     POP
    #> 1 Upper middle income FALSE 1869.866 72.495 27.0 294089996 3168033
    #> 2 Upper middle income FALSE 2572.721 74.579 31.7 453309998 3051010
    #> 3 Upper middle income FALSE 3062.674 75.228 30.6 354950012 3011487
    #> 4 Upper middle income FALSE 3775.581 75.912 30.0 338510010 2947314
    #> 5 Upper middle income FALSE 4276.608 77.252 29.0 335769989 2900401
    #> 6 Upper middle income FALSE 4413.297 77.813 34.6 260779999 2889104
    

    Created on 2022-02-03 by the reprex package (v2.0.1)