rjointidyverse

How to Join Two Unequal-Length Columns by Matching Identical Strings


I have two tibbles with unique values, with unequal length like:

df1 <- structure(list(col1 = c("A", "T", "C", "D", "X", "F")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))
df2 <- structure(list(col2 = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L))
> df1
# A tibble: 6 × 1
  col1 
  <chr>
1 A    
2 T    
3 C    
4 D    
5 X    
6 F 
> df2
# A tibble: 9 × 1
  col2 
  <chr>
1 A    
2 B    
3 C    
4 D    
5 F    
6 G    
7 H    
8 I    
9 J   

I would like to get:

> df3
# A tibble: 11 × 2
   col1  col2 
   <chr> <chr>
 1 A     A    
 2 NA    B    
 3 T     NA   
 4 C     C    
 5 D     D    
 6 X     NA   
 7 F     F    
 8 NA    G    
 9 NA    H    
10 NA    I    
11 NA    J

Each identical string in col1 of df1 and col2 in df2 should be in the same row side by side. For example, if a string in col1 and col2 is the same, they should be in the same row (e.g., string A). If a string exists in col1 but not in col2, it should be NA in col2, and vice versa.

I would be grateful for any further advice.


Solution

  • library(dplyr)
    
    full_join(df1, df2, by = join_by(col1 == col2), keep = TRUE)
    #> # A tibble: 12 × 2
    #>    col1  col2 
    #>    <chr> <chr>
    #>  1 A     A    
    #>  2 T     <NA> 
    #>  3 C     C    
    #>  4 D     D    
    #>  5 X     <NA> 
    #>  6 F     F    
    #>  7 <NA>  B    
    #>  8 <NA>  E    
    #>  9 <NA>  G    
    #> 10 <NA>  H    
    #> 11 <NA>  I    
    #> 12 <NA>  J