rdataframeloopsleft-joinmultiple-columns

How to get a value from a column from a dataframe if the id is present in multiple columns in another dataframe?


I have 2 dataframes, df1 which are ID numbers of participants (some have more than one ID but is the same person)

df1 <-data.frame(hospid_1=c("A", "B", "C"),
                 hospid_2=c("E", "", "I"),
                hospid_3=c("", "K", ""),
                test = c("jack", "will", "gil"))

and df2 which are results of a blood test with the hospids that correspond to df1 (but in different columns)

df2 <- data.frame(hospid_1=c("A", "I", "K"),
                     Feature=c("123", "456", "789"))

How do I get the test column from df1 matched to the hospids from df2?

The results I would like is

  hospid Feature test
1      A     123 jack
2      I     456  gil
3      K     789  wil

This is a simplified version, I have thousands of hospids in df2 with the master file of df1 that needs matching up. Short of doing it line by line with left_join to each hospid_1/2/3 column, is there a faster and cleaner way?


Solution

  • One way in base R:

    rows <- Reduce(
      \(x,y) ifelse(is.na(x), y, x), 
      lapply(df1[sprintf("hospid_%d", 1:3)], \(col) match(df2$hospid_1, col))
    )
    
    df2$test <- df1$test[rows]
    
    df2
    #   hospid_1 Feature test
    # 1        A     123 jack
    # 2        I     456  gil
    # 3        K     789 will