rreplacecoalesce

How to replace NAs in a dataframe with values in another dataframe for non-unique keys?


I would like to replace the values from two columns (board and date) in df1 when they equal NA in the same columns in df2. The key is subject: So in this case, for vand and haap. The problem is subjects (e.g., vand) in df1 are not unique, but the values in board and date columns are always the same. I would appreciate your advice.

df1 <- structure(list(nature = c("sop", "dior", "coats", "sem", "wia", 
    "bodo"), subject = c("gank", "vand", "vand", 
    "jav", "vand", "haap"), board = c("REW", "EWW", "EWW", "SSD", 
    "EWW", "MMB"), date = c("2023-07-12", 
    "2023-06-09", "2023-06-09", 
    "2023-06-09", "2023-06-09", 
    "2023-03-05")), row.names = c(NA, -6L), class = c("tbl_df", 
    "tbl", "data.frame"))

df2 <- structure(list(type = c("single", "couple", "couple", "couple", "couple", 
  "couple", "single", "couple", "couple", "couple"), name = c("ZIA", 
  "MIA", "lMIA", "LIA", 
  "LIA", "LIA", "DIA", 
  "LIA", "MIA", "SIA"
  ), subject = c("vand", "vank", "vank", 
  "jav", "tral", "twe", 
  "haap", "der", "leo", 
  "sdee"), board = c(NA, 
  "SSD", "REW", "EWW", "WWS, DDC", "SSD", 
  NA, "QQW", "XXD", "GGH"
  ), date = c(NA, "2023-07-03", "2023-07-03", 
  "2023-07-17", "2023-07-17", 
  "2023-01-16", NA, 
  "2023-07-17", "2023-06-08", 
  "2023-07-17")), class = "data.frame", row.names = c(NA, 
  -10L))

desired output:

 df3 <- structure(list(type = c("single", "couple", "couple", "couple", "couple", 
  "couple", "single", "couple", "couple", "couple"), name = c("ZIA", 
  "MIA", "lMIA", "LIA", 
  "LIA", "LIA", "DIA", 
  "LIA", "MIA", "SIA"
  ), subject = c("vand", "vank", "vank", 
  "jav", "tral", "twe", 
  "haap", "der", "leo", 
  "sdee"), board = c("EWW", 
  "SSD", "REW", "EWW", "WWS, DDC", "SSD", 
  "MMB", "QQW", "XXD", "GGH"
  ), date = c("2023-06-09", "2023-07-03", "2023-07-03", 
  "2023-07-17", "2023-07-17", 
  "2023-01-16", "2023-03-05", 
  "2023-07-17", "2023-06-08", 
  "2023-07-17")), class = "data.frame", row.names = c(NA, 
  -10L))    

Solution

  • You can use unique(df1) to reduce your first data.frame to unique rows, and then use dplyr::rows_update to replace the values:

    dplyr::rows_update(df2, unique(df1), unmatched = "ignore")
    

    output

    #      type name subject    board       date
    # 1  single  ZIA    vand      EWW 2023-06-09
    # 2  couple  MIA    vank      SSD 2023-07-03
    # 3  couple lMIA    vank      REW 2023-07-03
    # 4  couple  LIA     jav      SSD 2023-06-09
    # 5  couple  LIA    tral WWS, DDC 2023-07-17
    # 6  couple  LIA     twe      SSD 2023-01-16
    # 7  single  DIA    haap      MMB 2023-03-05
    # 8  couple  LIA     der      QQW 2023-07-17
    # 9  couple  MIA     leo      XXD 2023-06-08
    # 10 couple  SIA    sdee      GGH 2023-07-17