I have a long list of data frames where in each dataframe, there are some columns that need to be coalesced based on the name. That is, the values in the columns including Duo:
should replace the values in a corresponding row in a column whose name equals the string following Duo:
. For instance, the non-NA values in Duo:side
should replace the corresponding values in side
column. Please see the data and the desired output below:
data <- list(foo = structure(list(id = c("AA", "BB", "CC", "NN"), bodyPart = c("leg", "arm", "knee", "eye"), side = c("LEFT", "RIGHT", "LEFT", "LEFT"), device = c("LLI", "LSM", "GHT", "LLM"), length = c(12, 476,
7, 2), mwID = c("a12", "k87", "j98", NA), `Duo:length` = c(23, NA, 77, NA)), row.names = c(NA, -4L), class = "data.frame"), bar = structure(list(id = c("DC", "DD", "FF", "ZZ"), bodyPart = c("ARM", "ear", "knee", "ankel"), side = c("RIGHT", "RIGHT", "RIGHT",
"LEFT"), device = c("RWS", "LSM", "YYY", "GOM"), tqID = c("fg12", "ll23", "sc26", "kj8"), `Duo:side` = c(NA, "LEFT", "LEFT", NA), `Duo:device` = c(NA, NA, NA, "GGT")), row.names = c(NA, -4L), class = "data.frame"))
output <- list(foo = structure(list(id = c("AA", "BB", "CC", "NN"), bodyPart = c("leg", "arm", "knee", "eye"), side = c("LEFT", "RIGHT", "LEFT", "LEFT"), device = c("LLI", "LSM", "GHT", "LLM"), length = c(23, 476,
77, 2), mwID = c("a12", "k87", "j98", NA)), row.names = c(NA, -4L), class = "data.frame"), bar = structure(list(id = c("DC", "DD", "FF", "ZZ"), bodyPart = c("ARM", "ear", "knee", "ankel"), side = c("RIGHT", "LEFT", "LEFT",
"LEFT"), device = c("RWS", "LSM", "YYY", "GGT"), tqID = c("fg12", "ll23", "sc26", "kj8")), row.names = c(NA, -4L), class = "data.frame"))
What I have done is the following which only works for one data frame and there is still a problem; I want the column that was coalesced to be removed. I know how to remove it separately but it would be great to do it in one go. I would appreciate your advice.
dplyr::mutate(data$foo, length = dplyr::coalesce(`Duo:length`, length))
Here's a way in base R. I would also recommend checking the function dplyr::rows_update
which might come in handy in this case:
out <- lapply(data, \(x){
duo = grep("Duo", colnames(x), value = TRUE)
og = gsub("Duo\\:\\.", "", duo)
x[og] <- mapply(\(y, z) ifelse(complete.cases(z), z, y), x[og], x[duo])
x[-grep("Duo", colnames(x))]
})
# $foo
# id bodyPart side device length mwID
# 1 AA leg LEFT LLI 23 a12
# 2 BB arm RIGHT LSM 476 k87
# 3 CC knee LEFT GHT 77 j98
# 4 NN eye LEFT LLM 2 <NA>
#
# $bar
# id bodyPart side device tqID
# 1 DC ARM RIGHT RWS fg12
# 2 DD ear LEFT LSM ll23
# 3 FF knee LEFT YYY sc26
# 4 ZZ ankel LEFT GGT kj8