Considering this is my dataset
df <-
structure(list(id = 1:6, w = c(NA, 11L, 12L, 13L, 14L, 15L),
x = c(20L, 21L, NA, 23L, 24L, 25L), y = c(30L, 31L, 32L,
NA, 34L, 35L), z = c(40L, NA, 42L, 43L, 44L, NA), Group = c("Group1",
"Group2", "Group2", "Group2", "Group3", "Group3")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
id w x y z Group
1 NA 20 30 40 Group1
2 11 21 31 NA Group2
3 12 NA 32 42 Group2
4 13 23 NA 43 Group2
5 14 24 34 44 Group3
6 15 25 35 NA Group3
I am trying to sum observations from columns w, x, y,z row wise by Groups. I am using case_when
statement like this.
df <- df %>%
mutate(
group1_total <- case_when( Group = "Group1" ~ rowSums(w,x, na.rm = TRUE)),
group2_total <- case_when( Group = "Group2" ~ rowSums(w,x,z, na.rm = TRUE)),
group3_total <- case_when( Group = "Group3" ~ rowSums(w,x,y,z, na.rm = TRUE))
)
I am getting an error saying, error in rowSums()
. Not sure what I am doing wrong. Any suggestions or help is much appreciated, Thanks.
I would create a named list where we specify the variables to be summed for each group.
cur_data()
returns the data for the current group, group_cols[[unique(Group)]]
selects the appropriate columns for each group.
library(dplyr)
group_cols <- list('Group1' = c('w', 'x'), 'Group2' = c('w', 'x', 'z'),
'Group3' = c('w', 'x', 'y', 'z'))
df %>%
group_by(Group) %>%
mutate(total = rowSums(select(cur_data(),
group_cols[[unique(Group)]]), na.rm = TRUE)) %>%
ungroup
# id w x y z Group total
# <int> <int> <int> <int> <int> <chr> <dbl>
#1 1 NA 20 30 40 Group1 20
#2 2 11 21 31 NA Group2 32
#3 3 12 NA 32 42 Group2 54
#4 4 13 23 NA 43 Group2 79
#5 5 14 24 34 44 Group3 116
#6 6 15 25 35 NA Group3 75
The benefit of this is the calculation is done per group instead of per row.