Every time I think I've figured out the details of passing data frame columns into functions, I find a new situation that complicates the process.
I have a custom function in which I'm passing the data frame columns using curly brackets {{}}. This works great for calling them as part of dplyr sequences, as shown in sampfun1 below. However, if I want to use a very simple function on a single column (for example, sd(mtcars$disp)
), I run into difficulties, as it does not seem possible to use the curly brackets directly on the dataframe (df${{col}}
or any similar alternative I've tried).
Right now I'm getting around this by using df[[deparse(substitute(col))]]
, as shown in sampfun2 below. This is fine, but is a bit clunky, especially in complex functions where multiple columns are being passed and then being used in different ways. Is there a simpler way to achieve the output for sampfun2? I know I could just pass the column name as a string and go directly to df[[col]
, but I'd like to avoid that since I'm using the column in other ways elsewhere in the function.
library(dplyr)
sampfun1 <- function(df, col){
df %>%
mutate(xsd = sd({{col}}))
}
sampfun2 <- function(df, col){
colStr <- deparse(substitute(col))
dat_sd <- sd(df[[colStr]])
}
disp_sd1 <- sampfun1(mtcars, disp)
disp_sd2 <- sampfun2(mtcars, disp)
EDIT for clarification: This is a very simplified function just to display the issue of passing a column into a function and then calling just the column (rather than e.g. something through dplyr that calls first the data frame and then the function). My goal isn't to pass a large number of columns to the same function, just to simplify the syntax if I need to repeatedly call that column in different contexts. When calling a subset of the data frame using dplyr
, this isn't a problem - it only arises when trying to extract the column. Here is another example to maybe better illustrate what I'm trying to do:
sampfun3 <- function(df, col){
single_col <- df %>% select({{col}}) %>% pull()
dat_sd <- sd(single_col)
}
This also works for what I'm trying to do, though it's a little more cumbersome than sampfun2. I was just wondering if there's a simpler way to extract a specific column when it's been passed using {{}}.
More approaches:
sampfun3 <- function(df, col) {
df |> pull({{col}}) |> sd()
}
> sampfun3(mtcars, disp)
[1] 123.9387
sampfun4 <- function(df, col){
df |> summarize(across( {{col}}, ~sd(.x)))
}
sampfun4(mtcars, disp)
> sampfun3(mtcars, disp)
disp
1 123.9387