This is somehow related to this question:
In principle I try to understand how rowwise
operations with mutate
across multiple columns applying more then 1 functions like (mean()
, sum()
, min()
etc..) work.
I have learned that across
does this job and not c_across
.
I have learned that the function mean()
is different to the function min()
in that way that mean()
doesn't work on dataframes and we need to change it to vector which can be done with unlist or as.matrix -> learned from Ronak Shah hereUnderstanding rowwise() and c_across()
Now with my actual case: I was able to do this task but I loose one column d
. How can I avoid the loose of the column d
in this setting.
My df:
df <- structure(list(a = 1:5, b = 6:10, c = 11:15, d = c("a", "b",
"c", "d", "e"), e = 1:5), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
Works not:
df %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
# Output:
a b c d e avg min max
<int> <int> <int> <chr> <int> <dbl> <chr> <chr>
1 1 6 11 a 1 NA 1 a
2 2 7 12 b 2 NA 12 b
3 3 8 13 c 3 NA 13 c
4 4 9 14 d 4 NA 14 d
5 5 10 15 e 5 NA 10 e
Works, but I loose column d
:
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
a b c e avg min max
<int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1 6 11 1 4.75 1 11
2 2 7 12 2 5.75 2 12
3 3 8 13 3 6.75 3 13
4 4 9 14 4 7.75 4 14
5 5 10 15 5 8.75 5 15
Using pmap()
from purrr
might be more preferable since you need to select the data just once and you can use the select helpers:
df %>%
mutate(pmap_dfr(across(where(is.numeric)),
~ data.frame(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...)))))
a b c d e max min avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 11 1 4.75
2 2 7 12 b 2 12 2 5.75
3 3 8 13 c 3 13 3 6.75
4 4 9 14 d 4 14 4 7.75
5 5 10 15 e 5 15 5 8.75
Or with the addition of tidyr
:
df %>%
mutate(res = pmap(across(where(is.numeric)),
~ list(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...))))) %>%
unnest_wider(res)