I have a data table (lv_timest
) with time stamps every 3 hours for each date:
# A tibble: 6 × 5
LV0_mean LV1_mean LV2_mean Date_time Date
<dbl> <dbl> <dbl> <S3:POSIXct> <date>
1 0.778 -4.12 0.736 2016-12-28 00:00:00 2016-12-28
2 0.376 -0.234 0.388 2016-12-28 03:00:00 2016-12-28
3 0.409 1.46 0.241 2016-12-28 06:00:00 2016-12-28
4 0.760 2.07 0.460 2016-12-28 09:00:00 2016-12-28
5 0.759 2.91 0.735 2016-12-28 12:00:00 2016-12-28
6 0.857 3.00 0.803 2016-12-28 15:00:00 2016-12-28
from which I would like to extract the time stamps that match as closely as possible those of another table (event_timest
):
# A tibble: 6 × 4
Event_number Date_time Date Date_time_new
<int> <S3: POSIXct> <date> <S3: POSIXct>
1 75 2016-12-28 08:00:00 2016-12-28 2016-12-28 08:00:00
2 123 2016-12-30 14:02:00 2016-12-30 2016-12-30 14:00:00
3 264 2017-01-07 06:12:00 2017-01-07 2017-01-07 06:00:00
4 317 2017-01-09 10:59:00 2017-01-09 2017-01-09 11:00:00
5 318 2017-01-09 13:31:00 2017-01-09 2017-01-09 14:00:00
6 369 2017-01-11 07:24:00 2017-01-11 2017-01-11 07:00:00
For example, for row 1 in table event_timest
, I would extract row 4 from table lv_timest
:
Event_number Date_time.x Date.x Date_time_new LV0_mean LV1_mean LV2_mean Date_time.y Date.y
<int> <S3: POSIXct> <date> <S3: POSIXct> <dbl> <dbl> <dbl> <S3: POSIXct> <date>
75 2016-12-28 08:00:00 2016-12-28 2016-12-28 08:00:00 0.760 2.07 0.460 2016-12-28 09:00:00 2016-12-28
In fact, the time difference should not be over one hour. I thought of using the fuzzyjoin
package for this, and writing a function that computes the time difference between timestamps of the two table, as hours. However, fuzzy_inner_join
replicates rows in the second table and takes several timestamps in the first table to match it.
require(lubridate)
require(fuzzyjoin)
diff_timest <- function(x, y){abs(x%--%y %/% hours(1)) <= 1} # time interval as hours ≤ 1 hour
match_timest <- fuzzy_inner_join(event_timest, lv_timest,
by = c("Date" = "Date",
"Date_time_new" = "Date_time"),
match_fun = list(`==`, diff_timest))
head(match_timest)
# A tibble: 6 × 9
Event_number Date_time.x Date.x Date_time_new LV0_mean LV1_mean LV2_mean Date_time.y Date.y
<int> <dttm> <date> <dttm> <dbl> <dbl> <dbl> <dttm> <date>
1 75 2016-12-28 08:00:00 2016-12-28 2016-12-28 08:00:00 0.760 2.07 0.460 2016-12-28 09:00:00 2016-12-28
2 123 2016-12-30 14:02:00 2016-12-30 2016-12-30 14:00:00 1.24 1.83 2.05 2016-12-30 15:00:00 2016-12-30
3 264 2017-01-07 06:12:00 2017-01-07 2017-01-07 06:00:00 -0.128 -5.43 2.72 2017-01-07 06:00:00 2017-01-07
4 317 2017-01-09 10:59:00 2017-01-09 2017-01-09 11:00:00 -0.0751 0.171 2.56 2017-01-09 09:00:00 2017-01-09
5 317 2017-01-09 10:59:00 2017-01-09 2017-01-09 11:00:00 -0.204 -0.797 2.28 2017-01-09 12:00:00 2017-01-09
6 318 2017-01-09 13:31:00 2017-01-09 2017-01-09 14:00:00 -0.204 -0.797 2.28 2017-01-09 12:00:00 2017-01-09
Would there be another way to do this?
I would suggest a standard join, followed by a grouped filter to the closest instance of each timestamp:
library(tidyverse)
library(lubridate)
match_timest <- event_timest %>%
inner_join(lv_timest, by = "Date") %>%
mutate(diff = abs(as.numeric(Date_time.x - Date_time.y, unit = "hours"))) %>%
group_by(Date_time.y) %>%
filter(diff <= 1 & diff == min(diff)) %>%
ungroup() %>%
select(!diff)
Note:
full_join(lv_timest, by = character())
) rather than the inner_join()
above.