rdplyr

Join/fill in data based on both row and column values


I am trying to make a new data frame that will overlay (so wherever there is a non-NA value in a cell, there is a corresponding value in the new data frame) with my original, which looks something like this:

    SiteData <- data.frame(
      Site = c("Apple", "Apple", "Pear"),
      ID   =  c(1:3),
      1/1/2020  = c(NA,1,1),
      1/8/2020  = c(1,1,1),
      1/15/2020 = c(1,0,NA),
      1/22/2020 = c(0,NA,NA)
    )
    
SiteData

    Site ID 1/1/2020 1/8/2020 1/15/2020 1/22/2020
    Apple 1   NA        1         1        0
    Apple 2    1        1         0        NA
    Pear  3    1        1         NA       NA

Based on the site, I would like to make it so that all 1s and 0s in the new data frames will be replaced with the appropriate weather data for that time.

Weather data example:

    WeatherData <- data.frame(
              Site = c("Apple", "Pear"),
              1/1/2020  = c(5,8),
              1/8/2020  = c(7,10),
              1/15/2020 = c(2,3),
              1/22/2020 = c(1,2)
            )  

WeatherData
  
    Site 1/1/2020 1/8/2020 1/15/2020 1/22/2020
    Apple   5       7          2        1
    Pear    8       10         3        2

Ultimately the goal would be for the end result to look like this:

Site ID 1/1/2020 1/8/2020 1/15/2020 1/22/2020
Apple 1   NA        7         2        1
Apple 2    5        7         2        NA
Pear  3    8       10         NA       NA

How do I accomplish this and join by both row value and column name?


Solution

  • I would suggest reshaping both tables long. Then it's a join to link them based on Site and name (in this case based on the date, stored as column name originally). The new data should use the second table's values (value.y) if value.x isn't NA. Then we can reshape wider again.

    library(tidyverse)
    left_join(     
      SiteData |> pivot_longer(-(Site:ID)),
      WeatherData |> pivot_longer(-Site), join_by(Site, name)) |>
      mutate(new = if_else(!is.na(value.x), value.y, NA)) |>
      select(-value.x, -value.y) |>
      pivot_wider(names_from = name, values_from = new)
    

    Result

      Site     ID X1.1.2020 X1.8.2020 X1.15.2020 X1.22.2020
      <chr> <int>     <dbl>     <dbl>      <dbl>      <dbl>
    1 Apple     1        NA         7          2          1
    2 Apple     2         5         7          2         NA
    3 Pear      3         8        10         NA         NA
    

    SiteData <- data.frame(
      Site = c("Apple", "Apple", "Pear"),
      ID   =  c(1:3),
      X1.1.2020  = c(NA,1,1),
      X1.8.2020  = c(1,1,1),
      X1.15.2020 = c(1,0,NA),
      X1.22.2020 = c(0,NA,NA)
    )
    
    WeatherData <- data.frame(
      Site = c("Apple", "Pear"),
      X1.1.2020  = c(5,8),
      X1.8.2020  = c(7,10),
      X1.15.2020 = c(2,3),
      X1.22.2020 = c(1,2)
    )