rnse

Different methods for passing column names to custom functions - {{}} vs. .data[[]]


I am struggling to get my head around creating custom R functions that take data frame columns as inputs. As far as I can tell, there are two broad ways of doing this, either by passing the column name directly (making use of non-standard evaluation?) or as a string, which changes the way you refer to it within a function. Both methods work in simple cases:

library(dplyr)
library(ggplot2)

#sample data
data(msleep)

#function to plot two columns, directly entering column names
plot_dir <- function(df, x, y){
  df %>% 
    ggplot(aes({{x}}, {{y}})) +
    geom_point() 
}

#function to plot two columns, entering column names as strings
plot_str <- function(df, x, y){
  df %>% 
    ggplot(aes(.data[[x]], .data[[y]])) +
    geom_point() 
}

#these give the same output:
plot_dir(df=msleep, x=sleep_total, y=awake)
plot_str(df=msleep, x='sleep_total', y='awake')

I've been told that the former is better practice and more versatile, and it also tends to simplify my coding, so I tend to prefer it. And in cases where you need to access the column name as a string, you can use yStr <- deparse(substitute(y)).

However, there are some cases when I get stuck needing to enter a variable as a string, such as providing a list of columns, as in the case below. Is there an alternate way to do this to maintain the {{}} syntax, either by using a different method to supply the column names, or by converting back from a string [sort of the opposite of deparse(substitute())]?

ylist <- c('awake', 'sleep_rem')

plot_list <- function(df, x, ylist){
  plotout <- list()
  for (yy in ylist){
    tmpplot <- ggplot(df, aes({{x}}, .data[[yy]])) + ##Note mixed methods here
        geom_point() 
    plotout[[yy]] <- tmpplot
  }
  return(plotout)
}

plotout <- plot_list(msleep, sleep_total, ylist)

I'd also like to know if there are specific terms for these two methods (column name with {{}} vs. column name string with .data[[]]) of function writing.


Solution

  • Use sym as shown:

    plot_list <- function(df, x, ylist){
      plotout <- list()
      for (yy in ylist){
        ys <- sym(yy)
        tmpplot <- ggplot(df, aes({{x}}, {{ys}})) + 
            geom_point() 
        plotout[[yy]] <- tmpplot
      }
      return(plotout)
    }