I have two data frames in R that share SOME columns and SOME rows in an arbitrary order.
I now want to create a third data frame that has the columns and rows of data frame 1 merged with the corresponding values from data frame 2:
df1 <- data.frame(name = c("a", "b", "c"),
col1 = c(NA, NA, NA),
col2 = c(NA, NA, NA),
col3 = c(NA, NA, NA))
df2 <- data.frame(name = c("d", "b", "a"),
col2 = c(1, 2, 3),
col1 = c(4, 5, 6),
col4 = c(NA, NA, NA))
My expected output is:
name col1 col2 col3
1 a 6 3 NA
2 b 5 2 NA
3 c NA NA NA
I know how I could achieve this step by step by first creating a df with common cols and rows from both data frames, sort accordingly etc.
There's also the (my currently) preferred option of first pivot_longer both data frames, then do a left_join, then pivot_wider.
However, I'm wondering if there is a more direct (and tidy) way similar to Excel's combination of index & match where I would match columns and rows and write the df2 values directly into the respective df1 cell?
That's pretty much what dplyr::rows_patch() does, with few safeguards (or limitations) :
- all columns in y(df2) must exist in x(df1)
- types of matching x & y columns should be compatible when casting from y to x.
For the former there's nothing that select(df2, any_of(names(df1)) can't fix. But regarding types, in the included example all-NA columns of df1 are logical (as class(NA) is "logical") while matching df2 columns are numerical and that would trigger a casting error:
Can't convert from
y$col1<double> tox$col1<logical> due to loss of precision.
Which may or may not be relevant with your real data.
library(dplyr)
# change types for all-NA columns
glimpse(df1)
#> Rows: 3
#> Columns: 4
#> $ name <chr> "a", "b", "c"
#> $ col1 <lgl> NA, NA, NA
#> $ col2 <lgl> NA, NA, NA
#> $ col3 <lgl> NA, NA, NA
df1 <- mutate(df1, across(where( \(x) all(is.na(x) )), as.numeric))
glimpse(df1)
#> Rows: 3
#> Columns: 4
#> $ name <chr> "a", "b", "c"
#> $ col1 <dbl> NA, NA, NA
#> $ col2 <dbl> NA, NA, NA
#> $ col3 <dbl> NA, NA, NA
rows_patch(
x = df1,
y = select(df2, any_of(names(df1))),
by = "name",
unmatched = "ignore"
)
#> name col1 col2 col3
#> 1 a 6 3 NA
#> 2 b 5 2 NA
#> 3 c NA NA NA
If you do construct your target frames yourself, you could perhaps start with something like this instead:
df1 <- data.frame(name = c("a", "b", "c"),
col1 = NA_real_, col2 = NA_real_, col3 = NA_real_)