[SOLVED] How to merge two dataframes using multiple columns as key?

How to merge two dataframes using multiple columns as key?

Say I have the following dataframes:

DF1 <- data.frame("A" = rep(c("A","B"), 18),
                  "B" = rep(c("C","D","E"), 12),
                  "NUM"= rep(rnorm(36,10,1)),
                  "TEST" = rep(NA,36))

DF2 <- data.frame("A" = rep("A",6),
                  "B" = rep(c("C","D"),6),
                  "VAL" = rep(c(1,3),3))

*Note: Each unique combination of variables A and B in DF2 should have a unique VAL.

For each row, I would like to replace the NA in TEST with the corresponding value of VAL in DF1 if the values in columns A and A match and the values in columns B and B match for that row. Otherwise, I'd leave TEST as NA. How would I do this without looping through each combination using match?

Ideally, an answer would scale to two data frames with many columns to match upon.

Solution

As Akrun mentioned in comments, your lookup table (DF2) needs to be reduced to just its unique A/B combinations. For your current dataframe, this isn't a problem, but you will need additional rules if there are multiple possible values for the same combination. From there, the solution is easy:

DF2.u <- unique(DF2)
DF3 <- merge(DF1, DF2.u, all = T)

Note that this will produce a new dataframe with an empty TEST column (all values NA), and a VAL column assigned from DF2. To do exactly what you wanted (replace TEST with VAL where possible), here is some slightly clunkier code:

DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL

EDIT: in response to your question, you can boil down DF2 if necessary quite simple:

DF2$C <- c(1:12) #now unique() won't work
DF2.u <- unique(DF2[1:3])

 A B VAL
1 A C   1
2 A D   3