rdataframedplyrtidyverserowsum

Rowwise sum of subset of columns, column names are stored in another column


Maybe the title doesn't make it too clear, but I didn't know how to formulate it better. I develop the question:

I am working in r with a data frame like the following but much larger:

df <- data.frame(ref = c("01","02","03","04","05"), 
                         var1 = c(2,3,6,8,5),
                         var2 = c(3,8,1,5,7),
                         var3 = c(1,1,4,5,6),
                         columns_to_sum = c("var1, var2", 
                                            "var1, var3", 
                                            "var2, var3",
                                            "var1, var2, var3",
                                            "var1"))

What I want to get is a new column that is the result of the sum of the values of the columns whose name is in the column: "columns_to_sum". Thus, the resulting data frame would be similar to :

enter image description here

Just in case I add the result to the reproducible example in the lines of code below:

df <- data.frame(ref = c("01","02","03","04","05", 
                 var1 = c(2,3,6,8,5),
                 var2 = c(3,8,1,5,7),
                 var3 = c(1,1,4,5,6),
                 columns_to_sum = c("var1, var2", 
                    "var1, var3", 
                    "var2, var3",
                    "var1, var2, var3",
                    "var1"),
                 result_column = c(5,4,5,18,5)
                 )

I have used the dplyr::select function inside the rowSums function a couple of times, for example:

df_rs <- df %>% mutate(
  result_column = rowSums(dplyr::select(., matches("var")))
  )

However, I cannot find a way to select for the sum the columns that are included in the column: "columns_to_sum".

Any idea?

Thank you very much in advance!


Solution

  • Loop through rows, get column names - strsplit, and sum:

    cbind(df, 
          result = sapply(seq(nrow(df)), function(i){
            cols <- unlist(strsplit(df$columns_to_sum[ i ], ", ", fixed = TRUE))
            sum(df[i, cols, drop = FALSE])
          }))
    
    #   ref var1 var2 var3   columns_to_sum result
    # 1  01    2    3    1       var1, var2      5
    # 2  02    3    8    1       var1, var3      4
    # 3  03    6    1    4       var2, var3      5
    # 4  04    8    5    5 var1, var2, var3     18
    # 5  05    5    7    6             var1      5