rdplyrrlangtidyevalr-glue

Dynamic variables names in dplyr function across multiple columns


I am trying to write a function that uses dplyr::summarise to obtain means of multiple columns of a data frame and assign dynamic names to the summarised columns using the new rlang glue syntax and := operator.

Here's a simple example of my problem using the mtcars dataset.

When summarising over just one column - the glue syntax works (i.e. the summarised column name is mean_mpg):

mean_fun <- function(data, group_cols, summary_col) {
 data %>%
 group_by(across({{ group_cols }})) %>%
 summarise("mean_{{ summary_col }}" := mean({{ summary_col }}, na.rm = T))
}
mean_fun(mtcars, c(cyl, gear), mpg)

   cyl  gear mean_mpg
  <dbl> <dbl>    <dbl>
1     4     3     21.5
2     4     4     26.9
3     4     5     28.2
4     6     3     19.8
5     6     4     19.8
6     6     5     19.7
7     8     3     15.0
8     8     5     15.4

But the equivalent does not name the cols properly when summarising over multiple columns:

mean_fun_multicols <- function(data, group_cols, summary_cols) {
  data %>%
    group_by(across({{ group_cols }})) %>%
    summarise("mean_{{ summary_cols }}" := across({{ summary_cols }}, ~ mean(., na.rm = T)))
}
mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))

    cyl  gear `mean_c(mpg, wt)`$mpg   $wt
  <dbl> <dbl>                 <dbl> <dbl>
1     4     3                  21.5  2.46
2     4     4                  26.9  2.38
3     4     5                  28.2  1.83
4     6     3                  19.8  3.34
5     6     4                  19.8  3.09
6     6     5                  19.7  2.77
7     8     3                  15.0  4.10
8     8     5                  15.4  3.37

How can I get the summarised column names to read mean_mpg and mean_wt? And why does this not work?

I realise that there are likely many other ways to perform this task but I would like to know how to get this method (i.e. using tidy eval, rlang syntax in a bespoke function) to work for teaching purposes and my own understanding!

Thank you


Solution

  • We could use .names in across to rename

    mean_fun_multicols <- function(data, group_cols, summary_cols) {
      data %>%
        group_by(across({{group_cols}})) %>%
         summarise(across({{ summary_cols }},
             ~ mean(., na.rm = TRUE), .names = "mean_{.col}"), .groups = "drop")
    }
    

    -testing

    mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
    # A tibble: 8 × 4
        cyl  gear mean_mpg mean_wt
      <dbl> <dbl>    <dbl>   <dbl>
    1     4     3     21.5    2.46
    2     4     4     26.9    2.38
    3     4     5     28.2    1.83
    4     6     3     19.8    3.34
    5     6     4     19.8    3.09
    6     6     5     19.7    2.77
    7     8     3     15.0    4.10
    8     8     5     15.4    3.37
    

    NOTE: The := is mainly used when there is a single column in tidyverse


    If we use the OP's function, we are assigning multiple columns to a single column and this returns a tibble instead of a normal column. We may need to unpack

    library(tidyr)
    > mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% str
    `summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
    grouped_df [8 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
     $ cyl            : num [1:8] 4 4 4 6 6 6 8 8
     $ gear           : num [1:8] 3 4 5 3 4 5 3 5
     $ mean_c(mpg, wt): tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
      ..$ mpg: num [1:8] 21.5 26.9 28.2 19.8 19.8 ...
      ..$ wt : num [1:8] 2.46 2.38 1.83 3.34 3.09 ...
     - attr(*, "groups")= tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
      ..$ cyl  : num [1:3] 4 6 8
      ..$ .rows: list<int> [1:3] 
      .. ..$ : int [1:3] 1 2 3
      .. ..$ : int [1:3] 4 5 6
      .. ..$ : int [1:2] 7 8
      .. ..@ ptype: int(0) 
      ..- attr(*, ".drop")= logi TRUE
    
    > mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% 
            unpack(where(is_tibble))
    `summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
    # A tibble: 8 × 4
    # Groups:   cyl [3]
        cyl  gear   mpg    wt
      <dbl> <dbl> <dbl> <dbl>
    1     4     3  21.5  2.46
    2     4     4  26.9  2.38
    3     4     5  28.2  1.83
    4     6     3  19.8  3.34
    5     6     4  19.8  3.09
    6     6     5  19.7  2.77
    7     8     3  15.0  4.10
    8     8     5  15.4  3.37