I’m exploring the across() function introduced in recent versions of dplyr, and I'm trying to understand how to use it to apply a custom function that returns multiple columns. Specifically, I want to apply a function that calculates both the mean and standard deviation for selected numeric columns in my data frame and returns these as separate columns.
For example, given the following data frame:
library(dplyr)
df <- data.frame(
Group = rep(letters[1:3], each = 4),
Value1 = rnorm(12, mean = 10, sd = 2),
Value2 = rnorm(12, mean = 5, sd = 1)
)
I want to create a new data frame that includes the mean and standard deviation for each value column, something like this:
Group Mean_Value1 SD_Value1 Mean_Value2 SD_Value2
1 a 9.812 2.034 4.955 1.085
2 b 10.231 1.987 5.023 0.923
3 c 10.032 2.121 4.998 1.098
I’ve tried the following approach but I’m not sure how to make it work properly with across()
:
df_summary <- df %>%
group_by(Group) %>%
summarise(across(starts_with("Value"), ~ c(mean = mean(.), sd = sd(.))))
This throws an error because across() doesn't seem to naturally handle functions that return multiple columns.
My specific questions are:
across()
for functions that return multiple values?dplyr
or another package in R?across()
when dealing with custom functions like this?Any guidance on how to accomplish this would be greatly appreciated!
To address
Is there a better way to achieve this using dplyr or another package in R?
There are a couple of packages providing such grouping functions. If we define "better" as without the use of external packages, we can do:
aggregate(df[grepl("Value", names(df))], df["Group"], \(x) c(Mean=mean(x), SD=sd(x)))
giving
Group Value1.Mean Value1.SD Value2.Mean Value2.SD
1 a 10.901248 2.365063 4.5826417 0.8582879
2 b 9.358671 2.549811 4.9142623 1.0512226
3 c 11.040255 1.491652 5.2339545 1.0130163
This might be an alternative if the way aggregate()
displays [edited verb] column names does not bother you.