rfunctionnse

Passing data frame columns into simple functions with NSE


Every time I think I've figured out the details of passing data frame columns into functions, I find a new situation that complicates the process.

I have a custom function in which I'm passing the data frame columns using curly brackets {{}}. This works great for calling them as part of dplyr sequences, as shown in sampfun1 below. However, if I want to use a very simple function on a single column (for example, sd(mtcars$disp)), I run into difficulties, as it does not seem possible to use the curly brackets directly on the dataframe (df${{col}} or any similar alternative I've tried).

Right now I'm getting around this by using df[[deparse(substitute(col))]], as shown in sampfun2 below. This is fine, but is a bit clunky, especially in complex functions where multiple columns are being passed and then being used in different ways. Is there a simpler way to achieve the output for sampfun2? I know I could just pass the column name as a string and go directly to df[[col], but I'd like to avoid that since I'm using the column in other ways elsewhere in the function.

library(dplyr)

sampfun1 <- function(df, col){
  df %>% 
    mutate(xsd = sd({{col}}))
}

sampfun2 <- function(df, col){
  colStr <- deparse(substitute(col))
  dat_sd <- sd(df[[colStr]])
}

disp_sd1 <- sampfun1(mtcars, disp)
disp_sd2 <- sampfun2(mtcars, disp)

EDIT for clarification: This is a very simplified function just to display the issue of passing a column into a function and then calling just the column (rather than e.g. something through dplyr that calls first the data frame and then the function). My goal isn't to pass a large number of columns to the same function, just to simplify the syntax if I need to repeatedly call that column in different contexts. When calling a subset of the data frame using dplyr, this isn't a problem - it only arises when trying to extract the column. Here is another example to maybe better illustrate what I'm trying to do:

sampfun3 <- function(df, col){
  single_col <- df %>% select({{col}}) %>% pull()
  dat_sd <- sd(single_col)
}

This also works for what I'm trying to do, though it's a little more cumbersome than sampfun2. I was just wondering if there's a simpler way to extract a specific column when it's been passed using {{}}.


Solution

  • More approaches:

    sampfun3 <- function(df, col) {
      df |> pull({{col}}) |> sd()
    }
    
    > sampfun3(mtcars, disp)
    [1] 123.9387
    
    
    
    sampfun4 <- function(df, col){
      df |> summarize(across( {{col}}, ~sd(.x)))
    }
    
    sampfun4(mtcars, disp)
    
    > sampfun3(mtcars, disp)
          disp
    1 123.9387