I am struggling to get my head around creating custom R functions that take data frame columns as inputs. As far as I can tell, there are two broad ways of doing this, either by passing the column name directly (making use of non-standard evaluation?) or as a string, which changes the way you refer to it within a function. Both methods work in simple cases:
library(dplyr)
library(ggplot2)
#sample data
data(msleep)
#function to plot two columns, directly entering column names
plot_dir <- function(df, x, y){
df %>%
ggplot(aes({{x}}, {{y}})) +
geom_point()
}
#function to plot two columns, entering column names as strings
plot_str <- function(df, x, y){
df %>%
ggplot(aes(.data[[x]], .data[[y]])) +
geom_point()
}
#these give the same output:
plot_dir(df=msleep, x=sleep_total, y=awake)
plot_str(df=msleep, x='sleep_total', y='awake')
I've been told that the former is better practice and more versatile, and it also tends to simplify my coding, so I tend to prefer it. And in cases where you need to access the column name as a string, you can use yStr <- deparse(substitute(y)).
However, there are some cases when I get stuck needing to enter a variable as a string, such as providing a list of columns, as in the case below. Is there an alternate way to do this to maintain the {{}} syntax, either by using a different method to supply the column names, or by converting back from a string [sort of the opposite of deparse(substitute())]?
ylist <- c('awake', 'sleep_rem')
plot_list <- function(df, x, ylist){
plotout <- list()
for (yy in ylist){
tmpplot <- ggplot(df, aes({{x}}, .data[[yy]])) + ##Note mixed methods here
geom_point()
plotout[[yy]] <- tmpplot
}
return(plotout)
}
plotout <- plot_list(msleep, sleep_total, ylist)
I'd also like to know if there are specific terms for these two methods (column name with {{}} vs. column name string with .data[[]]) of function writing.
Use sym
as shown:
plot_list <- function(df, x, ylist){
plotout <- list()
for (yy in ylist){
ys <- sym(yy)
tmpplot <- ggplot(df, aes({{x}}, {{ys}})) +
geom_point()
plotout[[yy]] <- tmpplot
}
return(plotout)
}