I've got a dataset like this:
df1 <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6),
col3 = c(2, 8, 9),
col4 = c(5, 11, 12),
col5 = c(13, 14, 15),
col6 = c(16, 17, 18),
col7 = c(19, 20, 21),
col8 = c(22, 23, 24)
)
and a second that has a 'key' of matches that I'm looking for in df1 :
df2 <- data.frame(
colA = c(1, 2, 3),
colB = c(4, 5, 6),
value = c(100, 200, 300)
)
what I'm trying to do is find each pair-wise column match from left to right and create a new column containing the value from df2 everytime there is a match so that it looks like this:
df3 <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6),
col3 = c(2, 8, 9),
col4 = c(5, 11, 12),
col5 = c(13, 14, 15),
col6 = c(16, 17, 18),
col7 = c(19, 20, 21),
col8 = c(22, 23, 24),
match1 = c(100, 200, 300),
match2 = c(200, NA, NA)
)
I've tried this kind of approach:
df_match <- inner_join(df1, df2, by = c("col1" = "colA", "col2" = "colB"))
df1$matched_value <- df_match$value[match(paste(df1$col1, df1$col2), paste(df_match$col1, df_match$col2))]
but it only returns one match across the rows. The other issue is I'm running this through many iterations that have varying numbers of columns in df1. I'm thinking I need something along the lines of across rows starts with 'col' but I'm pretty stuck.
You can try the following using Reduce
:
df3 <- Reduce(left_join,
append(list(df1),
lapply(1:(ncol(df1) - 1), \(i)
setNames(df2, c(paste0("col", i+0:1), paste0("match", i))))))
Which returns:
col1 col2 col3 col4 col5 col6 col7 col8 match1 match2 match3 match4 match5 match6 match7
1 1 4 2 5 13 16 19 22 100 NA 200 NA NA NA NA
2 2 5 8 11 14 17 20 23 200 NA NA NA NA NA NA
3 3 6 9 12 15 18 21 24 300 NA NA NA NA NA NA
It would then be easy to omit the "match" columns that are all NA, if desired. Eg.
select(df3, where(~any(!is.na(.))))
col1 col2 col3 col4 col5 col6 col7 col8 match1 match3
1 1 4 2 5 13 16 19 22 100 200
2 2 5 8 11 14 17 20 23 200 NA
3 3 6 9 12 15 18 21 24 300 NA