rdplyrjanitor

R: Pass character vector of column names to function which can optionally take multiple arguments


My specific issue involves the function tabyl from the janitor package, but it may be also be relevant to other cases.

I am trying to pass a character vector with column names as parameters to a custom function which creates a summary table. I should be able to pass either 1, 2 or 3 names, as the tabyl function can accept either of those. However, tabyl only recognises my vector as a single parameter, using the first element and ignoring the others.

The reason I prefer to feed a vector to my custom function instead of individual names is because I am doing other manipulations on the vector.

I have tried using do.call to extract the vector elements as in other solutions, but struggle to combine it alongside the tidyselect command all_of, which I presumably need to interpret the character elements as column names.

My code is currently:

library("tidyverse")
library("janitor")

table_func <- function(data=NULL,
                       vars=NULL) {
  
  data %>% tabyl(all_of(vars))

  # Other code involving "all_of(vars)"

}

dat <- tibble(x=c(1:4),y=c(5:8))

vc <- c("x","y")

table_func(dat,vc)

Current output:

 x n percent
 1 1    0.25
 2 1    0.25
 3 1    0.25
 4 1    0.25

Desired output:

 x 5 6 7 8
 1 1 0 0 0
 2 0 1 0 0
 3 0 0 1 0
 4 0 0 0 1

Solution

  • This would work if tabyl() was dplyr::select(), but evidently the janitor package, or this function, doesn't support the same tidy select syntax.

    base R approach

    When I get into this kind of tidy evaluation pit of doom, I try to claw my way out with base R.

    You want to call: tabyl(data, ...), where the ... is each element in vars. You can do this with do.call() which

    constructs and executes a function call from a name or a function and a list of arguments to be passed to it.

    The other issue is that you are supplying a character vector, but we need names (i.e. tabyl(data, x, y) rather than tabyl(data, "x", "y"). So we use as.name() on each element of vars.

    table_func <- function(data, vars) {
        tabyl |>
            do.call(c(list(data), lapply(vars, as.name)))
    }
    
    table_func(dat, vc)
    #  x 5 6 7 8
    #  1 1 0 0 0
    #  2 0 1 0 0
    #  3 0 0 1 0
    #  4 0 0 0 1
    

    tidyverse approach

    If you want to stick with the tidyverse you can use !!!, the splice operator. Note that:

    Most tidyverse functions support !!! out of the box. With base functions you need to use inject() to enable !!!.

    library(rlang)
    table_func <- function(data, vars) {
        data |>
            tabyl(!!!parse_exprs(vars)) |>
            inject()
    }
    
    table_func(dat, vc)
    #  x 5 6 7 8
    #  1 1 0 0 0
    #  2 0 1 0 0
    #  3 0 0 1 0
    #  4 0 0 0 1
    

    A note on pipes

    As you noted in your comment, if you want to use the magrittr pipe, the above will not work:

    table_func <- function(data, vars) {
        data %>%
            tabyl(!!!parse_exprs(vars)) %>%
            inject()
    }
    table_func(dat, vc)
    # Error in `rlang::enquo()`:
    # ! Can't use `!!!` at top level.
    # Run `rlang::last_trace()` to see where the error occurred.
    

    Let's look at the abstract syntax tree (AST) of the function without pipes:

    table_func_pipeless <- function(data, vars) {
        inject(tabyl(data, !!!parse_exprs(vars)))
    }
    

    This is exactly the same as the AST produced with the native pipe:

    █─inject 
    └─█─tabyl 
      ├─data 
      ├─x 
      └─y 
    

    However, if we pass the magritrr version to lobstr::ast(), we get:

    █─`%>%` 
    ├─█─`%>%` 
    │ ├─data 
    │ └─█─tabyl 
    │   ├─x 
    │   └─y 
    └─█─inject 
    

    This is because %>% is a function call, whereas the native pipe is parsed as the table_func_pipeless() expression.

    I would just use the base R pipe. As Hadley Wickham states:

    Luckily there’s no need to commit entirely to one pipe or the other — you can use the base pipe for the majority of cases where it’s sufficient and use the magrittr pipe when you really need its special features.

    If you really want to use the magritr pipe, you can achieve the desired output by wrapping the entire expression in inject().

    table_func_magrittr <- function(data, vars) {
        inject(data %>% tabyl(!!!parse_exprs(vars)))
    }
    
    table_func_magrittr(dat, vc)
    #  x 5 6 7 8
    #  1 1 0 0 0
    #  2 0 1 0 0
    #  3 0 0 1 0
    #  4 0 0 0 1
    

    For more see: What are the differences between R's native pipe `|>` and the magrittr pipe `%>%`?