rdplyrcoalescemutate

How to coalesce multiple columns in a list of several dataframes based on part of the variable names?


I have a long list of data frames where in each dataframe, there are some columns that need to be coalesced based on the name. That is, the values in the columns including Duo: should replace the values in a corresponding row in a column whose name equals the string following Duo:. For instance, the non-NA values in Duo:side should replace the corresponding values in side column. Please see the data and the desired output below:

data <- list(foo = structure(list(id = c("AA", "BB", "CC", "NN"), bodyPart = c("leg", "arm", "knee", "eye"), side = c("LEFT", "RIGHT", "LEFT", "LEFT"), device = c("LLI", "LSM", "GHT", "LLM"), length = c(12, 476, 
  7, 2), mwID = c("a12", "k87", "j98", NA), `Duo:length` = c(23, NA, 77, NA)), row.names = c(NA, -4L), class = "data.frame"), bar = structure(list(id = c("DC", "DD", "FF", "ZZ"), bodyPart = c("ARM", "ear", "knee", "ankel"), side = c("RIGHT", "RIGHT", "RIGHT", 
  "LEFT"), device = c("RWS", "LSM", "YYY", "GOM"), tqID = c("fg12", "ll23", "sc26", "kj8"), `Duo:side` = c(NA, "LEFT", "LEFT", NA), `Duo:device` = c(NA, NA, NA, "GGT")), row.names = c(NA, -4L), class = "data.frame"))

output <- list(foo = structure(list(id = c("AA", "BB", "CC", "NN"), bodyPart = c("leg", "arm", "knee", "eye"), side = c("LEFT", "RIGHT", "LEFT", "LEFT"), device = c("LLI", "LSM", "GHT", "LLM"), length = c(23, 476, 
  77, 2), mwID = c("a12", "k87", "j98", NA)), row.names = c(NA, -4L), class = "data.frame"), bar = structure(list(id = c("DC", "DD", "FF", "ZZ"), bodyPart = c("ARM", "ear", "knee", "ankel"), side = c("RIGHT", "LEFT", "LEFT", 
  "LEFT"), device = c("RWS", "LSM", "YYY", "GGT"), tqID = c("fg12", "ll23", "sc26", "kj8")), row.names = c(NA, -4L), class = "data.frame"))

What I have done is the following which only works for one data frame and there is still a problem; I want the column that was coalesced to be removed. I know how to remove it separately but it would be great to do it in one go. I would appreciate your advice.

dplyr::mutate(data$foo, length = dplyr::coalesce(`Duo:length`, length))

Solution

  • Here's a way in base R. I would also recommend checking the function dplyr::rows_update which might come in handy in this case:

    out <- lapply(data, \(x){
      duo = grep("Duo", colnames(x), value = TRUE)
      og = gsub("Duo\\:\\.", "", duo)
      
      x[og] <- mapply(\(y, z) ifelse(complete.cases(z), z, y), x[og], x[duo])
      x[-grep("Duo", colnames(x))]
    })
    
    # $foo
    #   id bodyPart  side device length mwID
    # 1 AA      leg  LEFT    LLI     23  a12
    # 2 BB      arm RIGHT    LSM    476  k87
    # 3 CC     knee  LEFT    GHT     77  j98
    # 4 NN      eye  LEFT    LLM      2 <NA>
    # 
    # $bar
    #   id bodyPart  side device tqID
    # 1 DC      ARM RIGHT    RWS fg12
    # 2 DD      ear  LEFT    LSM ll23
    # 3 FF     knee  LEFT    YYY sc26
    # 4 ZZ    ankel  LEFT    GGT  kj8