rtidyversematchindex-match

index/match equivalent in R for merging data frames in a tidy way


I have two data frames in R that share SOME columns and SOME rows in an arbitrary order.

I now want to create a third data frame that has the columns and rows of data frame 1 merged with the corresponding values from data frame 2:

df1 <- data.frame(name = c("a", "b", "c"),
                  col1 = c(NA, NA, NA),
                  col2 = c(NA, NA, NA),
                  col3 = c(NA, NA, NA))

df2 <- data.frame(name = c("d", "b", "a"),
                  col2 = c(1, 2, 3),
                  col1 = c(4, 5, 6),
                  col4 = c(NA, NA, NA))

My expected output is:

  name col1 col2 col3
1    a    6    3   NA
2    b    5    2   NA
3    c   NA   NA   NA

I know how I could achieve this step by step by first creating a df with common cols and rows from both data frames, sort accordingly etc.

There's also the (my currently) preferred option of first pivot_longer both data frames, then do a left_join, then pivot_wider.

However, I'm wondering if there is a more direct (and tidy) way similar to Excel's combination of index & match where I would match columns and rows and write the df2 values directly into the respective df1 cell?


Solution

  • That's pretty much what dplyr::rows_patch() does, with few safeguards (or limitations) :
    - all columns in y(df2) must exist in x(df1)
    - types of matching x & y columns should be compatible when casting from y to x.

    For the former there's nothing that select(df2, any_of(names(df1)) can't fix. But regarding types, in the included example all-NA columns of df1 are logical (as class(NA) is "logical") while matching df2 columns are numerical and that would trigger a casting error:

    Can't convert from y$col1 <double> to x$col1 <logical> due to loss of precision.

    Which may or may not be relevant with your real data.

    library(dplyr)
    
    # change types for all-NA columns
    glimpse(df1)
    #> Rows: 3
    #> Columns: 4
    #> $ name <chr> "a", "b", "c"
    #> $ col1 <lgl> NA, NA, NA
    #> $ col2 <lgl> NA, NA, NA
    #> $ col3 <lgl> NA, NA, NA
    df1 <- mutate(df1, across(where( \(x) all(is.na(x) )), as.numeric)) 
    glimpse(df1)
    #> Rows: 3
    #> Columns: 4
    #> $ name <chr> "a", "b", "c"
    #> $ col1 <dbl> NA, NA, NA
    #> $ col2 <dbl> NA, NA, NA
    #> $ col3 <dbl> NA, NA, NA
    
    rows_patch(
      x = df1, 
      y = select(df2, any_of(names(df1))), 
      by = "name",
      unmatched = "ignore"
    )
    #>   name col1 col2 col3
    #> 1    a    6    3   NA
    #> 2    b    5    2   NA
    #> 3    c   NA   NA   NA
    

    If you do construct your target frames yourself, you could perhaps start with something like this instead:

    df1 <- data.frame(name = c("a", "b", "c"), 
                      col1 = NA_real_, col2 = NA_real_, col3 = NA_real_)