I'm trying to join one database of periods like this one:
id = c(rep(1,3), rep(2,3), rep(3,3))
start = as.Date(c("2014-07-01", "2015-03-12", "2016-08-13", "2014-07-01", "2015-03-12", "2016-08-13", "2014-07-01", "2015-03-12", "2016-08-13"))
end = as.Date(c("2015-03-11", "2015-08-12", "2018-12-31", "2015-03-11", "2015-08-12", "2018-12-31","2015-03-11", "2015-08-12", "2018-12-31"))
DT = data.table(id, start, end)
DT
id start end
1: 1 2014-07-01 2015-03-11
2: 1 2015-03-12 2015-08-12
3: 1 2016-08-13 2018-12-31
4: 2 2014-07-01 2015-03-11
5: 2 2015-03-12 2015-08-12
6: 2 2016-08-13 2018-12-31
7: 3 2014-07-01 2015-03-11
8: 3 2015-03-12 2015-08-12
9: 3 2016-08-13 2018-12-31
with one that has clinical registries (weight and height) like this one:
id_clin = (c(rep(1,2), rep (2,3), rep(3,4)))
date = as.Date(c("2014-10-23", "2016-09-01", "2017-01-01", "2014-08-01", "2015-02-01", "2017-06-01", "2018-03-05", "2018-09-01", "2018-11-30"))
weight = c(60, 65, 62, 75, 68, 90 , 102, 104 , 98 )
height = c(160, 160, 170, 175, 170, 200, 200, 200 ,200)
DT_clin = data.table(id_clin, date, weight, height)
DT_clin
id_clin date weight height
1: 1 2014-10-23 60 160
2: 1 2016-09-01 65 160
3: 2 2017-01-01 62 170
4: 2 2014-08-01 75 175
5: 2 2015-02-01 68 170
6: 3 2017-06-01 90 200
7: 3 2018-03-05 102 200
8: 3 2018-09-01 104 200
9: 3 2018-11-30 98 200
The desired outcome would look something like this*:
id start end date date2 weight height
1: 1 2014-07-01 2015-03-11 2014-10-23 2014-10-23 60.0 160.0
2: 1 2015-03-12 2015-08-12 <NA> <NA> NA NA
3: 1 2016-08-13 2018-12-31 2016-09-01 2016-09-01 65.0 160.0
4: 2 2014-07-01 2015-03-11 2014-08-01 2015-02-01 71.5 172.5
5: 2 2015-03-12 2015-08-12 <NA> <NA> NA NA
6: 2 2016-08-13 2018-12-31 2017-01-01 2017-01-01 62.0 170.0
7: 3 2014-07-01 2015-03-11 <NA> <NA> NA NA
8: 3 2015-03-12 2015-08-12 <NA> <NA> NA NA
9: 3 2016-08-13 2018-12-31 2018-03-05 2018-11-30 101.3 200.0
Also, if there is a way to make more than one operation for the different variables I will also be interested to know a way. (ex. calculate the mean of weight and the max value of height at the same time i make the join)
I've tested foverlaps with good results when there is only one value, but I'm unable to acomplish my objective when there is more than one value that overlap:
setkey(DT, id, start, end)
setkey(DT_clin, id_clin, date, date2)
foverlaps(DT[id == "1", ], DT_clin[id == "1",], by.x =c("id", "start", "end") , by.y = c("id_clin", "date", "date2" ), nomatch = NA )
Should I use Non-Equi-Joins?
*I duplicated date to create date2 and faked an interval of time
With foverlaps
:
library(data.table)
setkey(DT_clin, id_clin, date, date2)
foverlaps(DT, DT_clin,
by.x =c("id", "start", "end"),
by.y = c("id_clin", "date", "date2" ), nomatch = NA )[
,.(datemin = min(date),
datemax = max(date),
weight = mean(weight,na.r=T),
height = mean(height,na.rm=T)),
by=.(id,start,end)]
id start end datemin datemax weight height
1: 1 2014-07-01 2015-03-11 2014-10-23 2014-10-23 60.0 160.0
2: 1 2015-03-12 2015-08-12 <NA> <NA> NaN NaN
3: 1 2016-08-13 2018-12-31 2016-09-01 2016-09-01 65.0 160.0
4: 2 2014-07-01 2015-03-11 2014-08-01 2015-02-01 71.5 172.5
5: 2 2015-03-12 2015-08-12 <NA> <NA> NaN NaN
6: 2 2016-08-13 2018-12-31 2017-01-01 2017-01-01 62.0 170.0
7: 3 2014-07-01 2015-03-11 <NA> <NA> NaN NaN
8: 3 2015-03-12 2015-08-12 <NA> <NA> NaN NaN
9: 3 2016-08-13 2018-12-31 2017-06-01 2018-11-30 98.5 200.0