rdplyr

How can I use the across() function in dplyr to apply custom functions that return multiple columns?


I’m exploring the across() function introduced in recent versions of dplyr, and I'm trying to understand how to use it to apply a custom function that returns multiple columns. Specifically, I want to apply a function that calculates both the mean and standard deviation for selected numeric columns in my data frame and returns these as separate columns.

For example, given the following data frame:

library(dplyr)

df <- data.frame(
  Group = rep(letters[1:3], each = 4),
  Value1 = rnorm(12, mean = 10, sd = 2),
  Value2 = rnorm(12, mean = 5, sd = 1)
)

I want to create a new data frame that includes the mean and standard deviation for each value column, something like this:

  Group  Mean_Value1  SD_Value1  Mean_Value2  SD_Value2
1     a    9.812      2.034      4.955       1.085
2     b   10.231      1.987      5.023       0.923
3     c   10.032      2.121      4.998       1.098

I’ve tried the following approach but I’m not sure how to make it work properly with across():

df_summary <- df %>%
  group_by(Group) %>%
  summarise(across(starts_with("Value"), ~ c(mean = mean(.), sd = sd(.))))

This throws an error because across() doesn't seem to naturally handle functions that return multiple columns.

My specific questions are:

  1. How can I modify this approach to properly use across() for functions that return multiple values?
  2. Is there a better way to achieve this using dplyr or another package in R?
  3. What are the limitations of across() when dealing with custom functions like this?

Any guidance on how to accomplish this would be greatly appreciated!


Solution

  • To address

    Is there a better way to achieve this using dplyr or another package in R?

    There are a couple of packages providing such grouping functions. If we define "better" as without the use of external packages, we can do:

    aggregate(df[grepl("Value", names(df))], df["Group"], \(x) c(Mean=mean(x), SD=sd(x)))
    

    giving

      Group Value1.Mean Value1.SD Value2.Mean Value2.SD
    1     a   10.901248  2.365063   4.5826417 0.8582879
    2     b    9.358671  2.549811   4.9142623 1.0512226
    3     c   11.040255  1.491652   5.2339545 1.0130163
    

    This might be an alternative if the way aggregate() displays [edited verb] column names does not bother you.