I have the following two dataframes (dput
below):
> df1
group date
1 A 2023-01-10
2 A 2023-01-15
3 B 2023-01-09
4 B 2023-01-12
> df2
group date1 date2 value
1 A 2023-01-09 2023-01-11 2
2 B 2023-01-11 2023-01-14 3
I would like to join df2 to df1 based on if date falls between date1 and date2 using data.table
. When I run the following code I get the following output:
library(data.table)
setDT(df1)
setDT(df2)
df2[df1,
.(group, date, date1, date2, value),
on = .(group, date1 <= date, date2 >= date)]
#> group date date1 date2 value
#> 1: A 2023-01-10 2023-01-10 2023-01-10 2
#> 2: A 2023-01-15 2023-01-15 2023-01-15 NA
#> 3: B 2023-01-09 2023-01-09 2023-01-09 NA
#> 4: B 2023-01-12 2023-01-12 2023-01-12 3
Created on 2023-01-11 with reprex v2.0.2
This is almost the desired output, but the values of date1 and date2 are now the same as date while I would like them to be the same as the were in df2. So the desired output should look like this:
#> group date date1 date2 value
#> 1: A 2023-01-10 2023-01-09 2023-01-11 2
#> 2: A 2023-01-15 NA NA NA
#> 3: B 2023-01-09 NA NA NA
#> 4: B 2023-01-12 2023-01-11 2023-01-14 3
So I was wondering if anyone knows how to keep the origin values of df2 when joining in data.table
?
dput
data:
df1 <- structure(list(group = c("A", "A", "B", "B"), date = structure(c(1673305200,
1673737200, 1673218800, 1673478000), class = c("POSIXct", "POSIXt"
), tzone = "")), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(group = c("A", "B"), date1 = structure(c(1673218800,
1673391600), class = c("POSIXct", "POSIXt"), tzone = ""), date2 = structure(c(1673391600,
1673650800), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(2,
3)), class = "data.frame", row.names = c(NA, -2L))
are you simply looking for x.date1
?
Columns of x can be referred to using the prefix x. and is particularly useful during joining to refer to x's join columns as they are otherwise masked by i's. For example, X[Y, .(x.a-i.a, b), on="a"]
df2[df1,
.(group, date, x.date1, x.date2, value),
on = .(group, date1 <= date, date2 >= date)]
group date x.date1 x.date2 value
<char> <POSc> <POSc> <POSc> <num>
1: A 2023-01-10 2023-01-09 2023-01-11 2
2: A 2023-01-15 <NA> <NA> NA
3: B 2023-01-09 <NA> <NA> NA
4: B 2023-01-12 2023-01-11 2023-01-14 3