[SOLVED] Rowwise sum of subset of columns, column names are stored in another column

Rowwise sum of subset of columns, column names are stored in another column

Maybe the title doesn't make it too clear, but I didn't know how to formulate it better. I develop the question:

I am working in r with a data frame like the following but much larger:

df <- data.frame(ref = c("01","02","03","04","05"), 
                         var1 = c(2,3,6,8,5),
                         var2 = c(3,8,1,5,7),
                         var3 = c(1,1,4,5,6),
                         columns_to_sum = c("var1, var2", 
                                            "var1, var3", 
                                            "var2, var3",
                                            "var1, var2, var3",
                                            "var1"))

What I want to get is a new column that is the result of the sum of the values of the columns whose name is in the column: "columns_to_sum". Thus, the resulting data frame would be similar to :

Just in case I add the result to the reproducible example in the lines of code below:

df <- data.frame(ref = c("01","02","03","04","05", 
                 var1 = c(2,3,6,8,5),
                 var2 = c(3,8,1,5,7),
                 var3 = c(1,1,4,5,6),
                 columns_to_sum = c("var1, var2", 
                    "var1, var3", 
                    "var2, var3",
                    "var1, var2, var3",
                    "var1"),
                 result_column = c(5,4,5,18,5)
                 )

I have used the dplyr::select function inside the rowSums function a couple of times, for example:

df_rs <- df %>% mutate(
  result_column = rowSums(dplyr::select(., matches("var")))
  )

However, I cannot find a way to select for the sum the columns that are included in the column: "columns_to_sum".

Any idea?

Thank you very much in advance!

Solution

Loop through rows, get column names - strsplit, and sum:

cbind(df, 
      result = sapply(seq(nrow(df)), function(i){
        cols <- unlist(strsplit(df$columns_to_sum[ i ], ", ", fixed = TRUE))
        sum(df[i, cols, drop = FALSE])
      }))

#   ref var1 var2 var3   columns_to_sum result
# 1  01    2    3    1       var1, var2      5
# 2  02    3    8    1       var1, var3      4
# 3  03    6    1    4       var2, var3      5
# 4  04    8    5    5 var1, var2, var3     18
# 5  05    5    7    6             var1      5