Maybe the title doesn't make it too clear, but I didn't know how to formulate it better. I develop the question:
I am working in r with a data frame like the following but much larger:
df <- data.frame(ref = c("01","02","03","04","05"),
var1 = c(2,3,6,8,5),
var2 = c(3,8,1,5,7),
var3 = c(1,1,4,5,6),
columns_to_sum = c("var1, var2",
"var1, var3",
"var2, var3",
"var1, var2, var3",
"var1"))
What I want to get is a new column that is the result of the sum of the values of the columns whose name is in the column: "columns_to_sum". Thus, the resulting data frame would be similar to :
Just in case I add the result to the reproducible example in the lines of code below:
df <- data.frame(ref = c("01","02","03","04","05",
var1 = c(2,3,6,8,5),
var2 = c(3,8,1,5,7),
var3 = c(1,1,4,5,6),
columns_to_sum = c("var1, var2",
"var1, var3",
"var2, var3",
"var1, var2, var3",
"var1"),
result_column = c(5,4,5,18,5)
)
I have used the dplyr::select function inside the rowSums function a couple of times, for example:
df_rs <- df %>% mutate(
result_column = rowSums(dplyr::select(., matches("var")))
)
However, I cannot find a way to select for the sum the columns that are included in the column: "columns_to_sum".
Any idea?
Thank you very much in advance!
Loop through rows, get column names - strsplit, and sum:
cbind(df,
result = sapply(seq(nrow(df)), function(i){
cols <- unlist(strsplit(df$columns_to_sum[ i ], ", ", fixed = TRUE))
sum(df[i, cols, drop = FALSE])
}))
# ref var1 var2 var3 columns_to_sum result
# 1 01 2 3 1 var1, var2 5
# 2 02 3 8 1 var1, var3 4
# 3 03 6 1 4 var2, var3 5
# 4 04 8 5 5 var1, var2, var3 18
# 5 05 5 7 6 var1 5