I have about 40 columns that I want to summarize by group using dplyr::summarize()
. But each column is going to be summarized with a different function: some with mean()
, some with sum()
, some with min()
, and some with max()
. Only one function will be applied to each column. I have a lookup table to match each column name with the appropriate function. What is the best way to do this with dplyr::summarize()
without having to write out each statement individually?
Here is an example lookup table using mtcars. The code below it does not give the desired result. It applies all ten summary functions to all ten columns for a total of 100 summaries, where I only want ten summaries, one for each column.
library(dplyr)
lookup_table <- tibble(
variable = c('mpg', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb'),
summ_fn = list(mean, max, min, sum, mean, mean, median, mean, sum, max)
)
mtcars %>%
group_by(cyl) %>%
summarize(across(all_of(lookup_table$variable), .fns = lookup_table$summ_fn))
Instead of a tibble, initiate your lookup table to a named list where the names are the columns (i.e, lookup_table <- list("mpg" = mean, "disp" = max,...)
. Then you can use dplyr::curr_column
to index the function to the current column:
(Note that since I was a bit lazy and you (thankfully) provided the lookup table, I just used setNames
to assign the names.)
lookup_table <- setNames(list(mean, max, min, sum, mean, mean, median, mean, sum, max),
c('mpg', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb'))
mtcars %>%
summarize(across(names(lookup_table),
~ lookup_table[[cur_column()]](.)),
.by = cyl)
As @G. Grothendieck points out in the comments, if you already have the lookup table in tibble form, you can create a little helper function and achieve the same goal:
lookup <- function(x) {
with(lookup_table, summ_fn[[match(x, variable)]])
}
mtcars %>%
summarize(across(all_of(lookup_table$variable),
~ lookup(cur_column())(.)),
.by = cyl)
Output (for both approaches):
# cyl mpg disp hp drat wt qsec vs am gear carb
# 1 6 19.74286 258.0 105 25.10 3.117143 17.97714 1 0.4285714 27 6
# 2 4 26.66364 146.7 52 44.78 2.285727 19.13727 1 0.7272727 45 2
# 3 8 15.10000 472.0 150 45.21 3.999214 16.77214 0 0.1428571 46 8