rdplyrrowwiseacross

Combine: rowwise(), mutate(), across(), for multiple functions


This is somehow related to this question: In principle I try to understand how rowwise operations with mutate across multiple columns applying more then 1 functions like (mean(), sum(), min() etc..) work.

I have learned that across does this job and not c_across. I have learned that the function mean() is different to the function min() in that way that mean() doesn't work on dataframes and we need to change it to vector which can be done with unlist or as.matrix -> learned from Ronak Shah hereUnderstanding rowwise() and c_across()

Now with my actual case: I was able to do this task but I loose one column d. How can I avoid the loose of the column d in this setting.

My df:

df <- structure(list(a = 1:5, b = 6:10, c = 11:15, d = c("a", "b", 
"c", "d", "e"), e = 1:5), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

Works not:

df %>% 
  rowwise() %>% 
  mutate(across(a:e), 
         avg = mean(unlist(cur_data()), na.rm = TRUE),
         min = min(unlist(cur_data()), na.rm = TRUE), 
         max = max(unlist(cur_data()), na.rm = TRUE)
  )

# Output:
      a     b     c d         e   avg min   max  
  <int> <int> <int> <chr> <int> <dbl> <chr> <chr>
1     1     6    11 a         1    NA 1     a    
2     2     7    12 b         2    NA 12    b    
3     3     8    13 c         3    NA 13    c    
4     4     9    14 d         4    NA 14    d    
5     5    10    15 e         5    NA 10    e 

Works, but I loose column d:

df %>% 
  select(-d) %>% 
  rowwise() %>% 
  mutate(across(a:e), 
         avg = mean(unlist(cur_data()), na.rm = TRUE),
         min = min(unlist(cur_data()), na.rm = TRUE), 
         max = max(unlist(cur_data()), na.rm = TRUE)
  )

      a     b     c     e   avg   min   max
  <int> <int> <int> <int> <dbl> <dbl> <dbl>
1     1     6    11     1  4.75     1    11
2     2     7    12     2  5.75     2    12
3     3     8    13     3  6.75     3    13
4     4     9    14     4  7.75     4    14
5     5    10    15     5  8.75     5    15

Solution

  • Using pmap() from purrr might be more preferable since you need to select the data just once and you can use the select helpers:

    df %>% 
     mutate(pmap_dfr(across(where(is.numeric)),
                     ~ data.frame(max = max(c(...)),
                                  min = min(c(...)),
                                  avg = mean(c(...)))))
    
          a     b     c d         e   max   min   avg
      <int> <int> <int> <chr> <int> <int> <int> <dbl>
    1     1     6    11 a         1    11     1  4.75
    2     2     7    12 b         2    12     2  5.75
    3     3     8    13 c         3    13     3  6.75
    4     4     9    14 d         4    14     4  7.75
    5     5    10    15 e         5    15     5  8.75
    

    Or with the addition of tidyr:

    df %>% 
     mutate(res = pmap(across(where(is.numeric)),
                       ~ list(max = max(c(...)),
                              min = min(c(...)),
                              avg = mean(c(...))))) %>%
     unnest_wider(res)