rdplyr

Select non-syntactic names of a data frame when name is a variable


I have a data frame where some of the column names are non-syntactic. I want to select a range of these columns, but I am trying to do so inside a function. That is, the non-syntactic names which I want to select on are provided as variables to the function.

Here is an example:

library(dplyr)

df <- tibble(a = "a", b = "b", `1` = "c", `2` = "d")

f <- function(df) {
  select(df, `1`:`2`)
}

g <- function(df, var) {
  select(df, all_of(var):`2`)
}

The output is:

> f(df)
# A tibble: 1 x 2
  `1`   `2`  
  <chr> <chr>
1 c     d    

> g(df, var = 1)
# A tibble: 1 x 4
  a     b     `1`   `2`  
  <chr> <chr> <chr> <chr>
1 a     b     c     d 

> g(df, var = `1`)
Error in `select()`:
i In argument: `all_of(var)`.
Caused by error:
! object '1' not found
Run `rlang::last_trace()` to see where the error occurred.

I am trying to implement the functionality of g, but I want the output of f (in this specific example).

It seems that by all_of(var) I am referencing the column index; I would rather make a reference to the non-syntactic name of the third column of the data frame. How can I reference a non-syntactic name in a select-call when that name is stored as a variable?


Solution

  • The issue is not related to the use of a non-syntactic column name, i.e. you will get same error when you do:

    library(dplyr, warn = FALSE)
    
    df <- tibble(a = "a", b = "b", `1` = "c", `2` = "d")
    
    g <- function(df, var) {
      select(df, all_of(var):`2`)
    }
    
    g(df, var = b)
    #> Error in `select()`:
    #> ℹ In argument: `all_of(var)`.
    #> Caused by error:
    #> ! object 'b' not found
    

    The issue is simply that when you want to pass an unquoted column name to a function you have to take care of that using an quote-and-unquote pattern which can be simplified in one step using curly-curly aka {{ (see here and here), i.e. you can do:

    g <- function(df, var) {
      select(df, {{ var }}:`2`)
    }
    
    g(df, var = `1`)
    #> # A tibble: 1 × 2
    #>   `1`   `2`  
    #>   <chr> <chr>
    #> 1 c     d
    

    The only special thing about non-syntactic column names is that we have to wrap them inside backticks.