I have three vectors of dates in POSIX format that correspond with data collection times from three large datasets. Each of these vectors is of a different length and have similar (but not identical) dates.
I would like to:
An example with three vectors of dates in POSIX format (I want to group similar dates between the vectors, allowing a time window of 30 days):
A.dates = as.POSIXlt(c("1998-07-24 08:00","1999-07-24 08:00","2000-07-24 08:00"),
tz = "America/Los_Angeles")
B.dates = as.POSIXlt(c("1998-07-25 08:00","1999-07-25 08:00","2000-07-25 08:00"),
tz = "America/Los_Angeles")
C.dates = as.POSIXlt(c("1998-07-26 08:00","1999-07-26 08:00","2000-07-26 08:00","2000-08-29"),
tz = "America/Los_Angeles")
Specifying a time window of 30 days, there would be three date groupings (the sampling dates from July 1998, 1999, and 2000). The C.dates
vector has a fourth collection date of August 29th, 2000 which would be excluded from the groupings because:
You could loop over each element of each vector and create seq
uences ± 15 days
L <- list(A.dates, B.dates, C.dates)
tmp <- lapply(L, function(x) lapply(x, function(x)
do.call(seq, c(as.list(as.Date(x) + c(-15, 15)), "day"))))
and union
ite them in the list.
tmp <- lapply(tmp, function(x) as.Date(Reduce(union, x), origin="1970-01-01"))
Then simply find the intersect
i <- Reduce(function(...) as.Date(intersect(...), origin="1970-01-01"), tmp)
and select the dates accordingly.
tmp <- lapply(L, function(x) x[as.Date(x) %in% i])
tmp
# [[1]]
# [1] "1998-07-24 08:00:00 PDT" "1999-07-24 08:00:00 PDT"
# [3] "2000-07-24 08:00:00 PDT"
#
# [[2]]
# [1] "1998-07-25 08:00:00 PDT" "1999-07-25 08:00:00 PDT"
# [3] "2000-07-25 08:00:00 PDT"
#
# [[3]]
# [1] "1998-07-26 PDT" "1999-07-26 PDT" "2000-07-26 PDT"
To sort them by year according to your comment, we first unlist
them. Unfortunately this converts the dates into numerics (i.e. seconds since January, 1 1970), so we need to convert them back.
tmp <- as.POSIXlt(unlist(lapply(tmp, as.POSIXct)), origin="1970-01-01",
tz="America/Los_Angeles")
Finally we split
the list by the first four substr
ings which is the year (we also could do split(tmp, strftime(tmp, "%Y"))
though).
res <- split(tmp, substr(tmp, 1, 4))
res
# $`1998`
# [1] "1998-07-24 08:00:00 PDT" "1998-07-25 08:00:00 PDT"
# [3] "1998-07-26 00:00:00 PDT"
#
# $`1999`
# [1] "1999-07-24 08:00:00 PDT" "1999-07-25 08:00:00 PDT"
# [3] "1999-07-26 00:00:00 PDT"
#
# $`2000`
# [1] "2000-07-24 08:00:00 PDT" "2000-07-25 08:00:00 PDT"
# [3] "2000-07-26 00:00:00 PDT"