rrowmultiple-columnsmatchingpairwise

Matching pair-wise columns from left to right across rows in one dataframe to another dataframe and adding new columns with matching values


I've got a dataset like this:

 df1 <- data.frame(
    col1 = c(1, 2, 3),
    col2 = c(4, 5, 6),
    col3 = c(2, 8, 9),
    col4 = c(5, 11, 12),
    col5 = c(13, 14, 15),
    col6 = c(16, 17, 18),
    col7 = c(19, 20, 21),
    col8 = c(22, 23, 24)
  )

and a second that has a 'key' of matches that I'm looking for in df1 :

df2 <- data.frame(
    colA = c(1, 2, 3),
    colB = c(4, 5, 6),
    value = c(100, 200, 300)
  )

what I'm trying to do is find each pair-wise column match from left to right and create a new column containing the value from df2 everytime there is a match so that it looks like this:

 df3 <- data.frame(
    col1 = c(1, 2, 3),
    col2 = c(4, 5, 6),
    col3 = c(2, 8, 9),
    col4 = c(5, 11, 12),
    col5 = c(13, 14, 15),
    col6 = c(16, 17, 18),
    col7 = c(19, 20, 21),
    col8 = c(22, 23, 24),
    match1 = c(100, 200, 300),
    match2 = c(200, NA, NA)
  )

I've tried this kind of approach:

df_match <- inner_join(df1, df2, by = c("col1" = "colA", "col2" = "colB"))
  
  df1$matched_value <- df_match$value[match(paste(df1$col1, df1$col2), paste(df_match$col1, df_match$col2))]

but it only returns one match across the rows. The other issue is I'm running this through many iterations that have varying numbers of columns in df1. I'm thinking I need something along the lines of across rows starts with 'col' but I'm pretty stuck.


Solution

  • You can try the following using Reduce:

    df3 <- Reduce(left_join, 
       append(list(df1), 
              lapply(1:(ncol(df1) - 1), \(i)
                setNames(df2, c(paste0("col", i+0:1), paste0("match", i))))))
    

    Which returns:

      col1 col2 col3 col4 col5 col6 col7 col8 match1 match2 match3 match4 match5 match6 match7
    1    1    4    2    5   13   16   19   22    100     NA    200     NA     NA     NA     NA
    2    2    5    8   11   14   17   20   23    200     NA     NA     NA     NA     NA     NA
    3    3    6    9   12   15   18   21   24    300     NA     NA     NA     NA     NA     NA
    

    It would then be easy to omit the "match" columns that are all NA, if desired. Eg.

    select(df3, where(~any(!is.na(.))))
    
      col1 col2 col3 col4 col5 col6 col7 col8 match1 match3
    1    1    4    2    5   13   16   19   22    100    200
    2    2    5    8   11   14   17   20   23    200     NA
    3    3    6    9   12   15   18   21   24    300     NA