I want to sum across undefined amount of columns that start with the same string pattern. I would like for each column of my new data frame to have the column name of the string that was used in column name search. However, I am not sure, how to automatically assign column names that it would result in a given format, e.g.
c(m = "m", w = "w")
I want to use lapply in combination with rowSums like here:
lapply(c(m = "m", w = "w"),
\(x) rowSums(df[startsWith(names(df), x)]))
Basic input:
# m_16 w_16 w_17 m_17 w_18 m_18
#values1 3 4 8 1 12 4
#values2 8 0 12 1 3 2
Desired output:
# m_16 w_16 w_17 m_17 w_18 m_18 m w
#values1 3 4 8 1 12 4 8 24
#values2 8 0 12 1 3 2 11 15
However, as I have mentioned above, there could be more columns and they could start with z, w, etc, and sums also should be calculated, so I want to vectorize the "column name giving" and not to assign the column names by hand.
I have tried looking for it through other stackoverflow threads, but wasn't sure how to search for this problem and have no idea how to solve it myself, beside assigning column names afterwards.
Supposing your first column is named #
and the other columns are named in a pattern like letter_SomethingElse
.
search_pattern <- unique(gsub("(?<=^[a-z]).*", "\\1", names(df), perl = TRUE))[-1]
names(search_pattern) <- search_pattern
cbind(df, lapply(search_pattern, \(x) rowSums(df[startsWith(names(df), x)])))
returns
# m_16 w_16 w_17 m_17 w_18 m_18 m w
1 #values1 3 4 8 1 12 4 8 24
2 #values2 8 0 12 1 3 2 11 15