rdplyr

Tell left_join() to use another column if the first returns NA?


Using the R command left_join() from dplyr, which joins a dataframe to another, it is possible to tell it to check against two columns?

My problem is that it for a large number of rows, a join by column A will return NAs since there will be no match. However, there will be a match for these rows if the dataframe was instead joined by column B.

Can I tell left_join() to first try to join by column A, and if this returns an NA, try with column B? Most will be matched by column A, but I want to "save" the remaining rows by giving them a chance to join by another column.

Thanks for your help. Appreciate it.


Solution

  • The easiest approach is probably to first create a new column in your second dataframe which is equal to column A if it has a value, and equal to B if not. Then join on that new column.