rdplyrrowsum

r rowSums in case_when


Considering this is my dataset

df <- 
structure(list(id = 1:6, w = c(NA, 11L, 12L, 13L, 14L, 15L), 
    x = c(20L, 21L, NA, 23L, 24L, 25L), y = c(30L, 31L, 32L, 
    NA, 34L, 35L), z = c(40L, NA, 42L, 43L, 44L, NA), Group = c("Group1", 
    "Group2", "Group2", "Group2", "Group3", "Group3")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))


    id     w     x     y     z Group   
     1    NA    20    30    40 Group1
     2    11    21    31    NA Group2
     3    12    NA    32    42 Group2
     4    13    23    NA    43 Group2
     5    14    24    34    44 Group3
     6    15    25    35    NA Group3

I am trying to sum observations from columns w, x, y,z row wise by Groups. I am using case_when statement like this.

df <- df %>%
  mutate(
    
    group1_total <- case_when( Group = "Group1" ~ rowSums(w,x, na.rm = TRUE)),
    group2_total <- case_when( Group = "Group2" ~ rowSums(w,x,z, na.rm = TRUE)),
    group3_total <- case_when( Group = "Group3" ~ rowSums(w,x,y,z, na.rm = TRUE))   
    
  )

I am getting an error saying, error in rowSums(). Not sure what I am doing wrong. Any suggestions or help is much appreciated, Thanks.


Solution

  • I would create a named list where we specify the variables to be summed for each group.

    cur_data() returns the data for the current group, group_cols[[unique(Group)]] selects the appropriate columns for each group.

    library(dplyr)
    
    group_cols <- list('Group1' = c('w', 'x'), 'Group2' = c('w', 'x', 'z'), 
                       'Group3' = c('w', 'x', 'y', 'z'))
    
    df %>%
      group_by(Group) %>%
      mutate(total = rowSums(select(cur_data(), 
                             group_cols[[unique(Group)]]), na.rm = TRUE)) %>%
      ungroup
    
    #     id     w     x     y     z Group  total
    #  <int> <int> <int> <int> <int> <chr>  <dbl>
    #1     1    NA    20    30    40 Group1    20
    #2     2    11    21    31    NA Group2    32
    #3     3    12    NA    32    42 Group2    54
    #4     4    13    23    NA    43 Group2    79
    #5     5    14    24    34    44 Group3   116
    #6     6    15    25    35    NA Group3    75
    

    The benefit of this is the calculation is done per group instead of per row.