Replicate Index/Match in Tidyverse with date range and multiple criteria

I'm trying to match two datasets that have a common unique ID. One has a date and the other has date ranges. How do I get values from one into the other where the unique ID matches and the date is within the date range?

I tried case_when and it does work IF they are the same size, but the two datasets are not, so I'm stumped on what to do.

It doesn't necessarily have to be a tidyverse solution

library(tidyverse)

dates <- c("2024-01-01", "2024-03-27", "2024-05-15")
unique_keys <- c(123, 456, 789)
value1 <- c(10, 30, 20)

data_df <- tibble(dates, unique_keys, value1)
data_df

unique_keys <- c(123, 456)
start_date <- c("2023-12-15", "2024-04-01")
end_date <- c("2024-01-02", "2024-06-01")
value2 <- c(11, 28)

key_df <- tibble(unique_keys, start_date, end_date, value2)
key_df

# Goal: 
# Take value2 from the key_df and bring it into data_df when both the unique_key matches and the date is within the range

# In the example dataset:
# data_df row1 matches up with key_df row 1, so I want a new column in data_df that shows 11
# data_df row2 unique_keys matches with key_df row 2, BUT the date is wrong. Can just show NA

data_df %>% 
  mutate(value2 = case_when(
    data_df$unique_keys == key_df$unique_keys & data_df$dates >= key_df$start_date & data_df$dates <= key_df$end_date ~ key_df$value2
  ))

Solution

You are probably looking for an overlap joins (see R4ds). From your example, I changed dates expressed in characters to date representations with lubridate::ymd() and used a full_join() with join_by(unique_keys,between(dates, start_date, end_date)) :

data_df %>% 
  full_join( key_df, 
             join_by(unique_keys,between(dates, start_date, end_date)))

You might prefer an inner_join() or a left_join()

Full reprex:

library(tidyverse)
dates <- c("2024-01-01", "2024-03-27", "2024-05-15")
unique_keys <- c(123, 456, 789)
value1 <- c(10, 30, 20)

data_df <- tibble(dates, unique_keys, value1) %>% 
  mutate(dates = ymd(dates), 
         dates = ymd(dates))
data_df
#> # A tibble: 3 × 3
#>   dates      unique_keys value1
#>   <date>           <dbl>  <dbl>
#> 1 2024-01-01         123     10
#> 2 2024-03-27         456     30
#> 3 2024-05-15         789     20

unique_keys <- c(123, 456)
start_date <- c("2023-12-15", "2024-04-01")
end_date <- c("2024-01-02", "2024-06-01")
value2 <- c(11, 28)

key_df <- tibble(unique_keys, start_date, end_date, value2) %>% 
  mutate(start_date = ymd(start_date), 
         end_date = ymd(end_date))
key_df
#> # A tibble: 2 × 4
#>   unique_keys start_date end_date   value2
#>         <dbl> <date>     <date>      <dbl>
#> 1         123 2023-12-15 2024-01-02     11
#> 2         456 2024-04-01 2024-06-01     28


# See https://r4ds.hadley.nz/joins#overlap-joins 

data_df %>% 
  full_join( key_df, 
             join_by(unique_keys,between(dates, start_date, end_date)))
#> # A tibble: 4 × 6
#>   dates      unique_keys value1 start_date end_date   value2
#>   <date>           <dbl>  <dbl> <date>     <date>      <dbl>
#> 1 2024-01-01         123     10 2023-12-15 2024-01-02     11
#> 2 2024-03-27         456     30 NA         NA             NA
#> 3 2024-05-15         789     20 NA         NA             NA
#> 4 NA                 456     NA 2024-04-01 2024-06-01     28