rdplyrcoalesce

My older simple code to coalesce two data frames in R is no longer working despite not changing


I'm back to back to an older R script I wrote a year ago where I was coalescing two data frames with the same column names. I was using simple code: temp_2_1 <- dplyr::coalesce(tp2, tp1)

tp2 and tp1 are both data frames that have 4 columns with the same name. Before, the code would look in tp2 and if there was a NA value it would look in tp1 and pull the value from it into the place the NA was in tp2 for each of the four columns. The new data frame would stored in temp_2_1.

Now nothing is coalesced and temp_2_1 is simply a copy of tp2. My code did not change. My R version is R version 4.2.3 (2023-03-15 ucrt) and my dplyr version is dplyr_1.1.1. I was likely using an older version of both before, but can't find any documentation of a change that may have occured.

I've tried it without dplyr::, I've it column by column and nothing works.

I expect it to coalesce.


Solution

  • dplyr::coalesce used to do column-by-column coalescing between two data.frames. Since v1.1.0 it follows the definition of missingness from vctrs, which considers only rows that are completely missing.

    So for example, consider these two data.frames:

    ex1 <- data.frame(v1 = c(1, 2, NA), v2 = c(1, NA, NA))
    ex2 <- data.frame(v1 = 4:6, v2 = 4:6)
    

    For ex1, we have some missingness in row 2, and row 3 is completely missing. Using coalesce:

    coalesce(ex1, ex2)
    
      v1 v2
    1  1  1
    2  2 NA
    3  6  6
    

    We see that only row 3 gets replaced with the values from ex2.

    To get the old behaviour, you can call coalesce on each column yourself. E.g. here is an easy purrr solution:

    map2_dfc(ex1, ex2, coalesce)
    
    # A tibble: 3 × 2
         v1    v2
      <dbl> <dbl>
    1     1     1
    2     2     5
    3     6     6