Can someone explain to me why the following POSIXct time elements don't 'match'?
I'm ultimately trying to match the datetime values in vector time
to those in vector datetime
. I've simplified the example so that everything is the same date and only the hour values differ. In this example, I can use match
to find the index position of each time
in datetime
. However if I need to round my datetimes in time
to the nearest hour, given here as roundtime
suddenly the match fails? But time==roundtime
? What am I missing here?
datetime <- as.POSIXct("2020-01-01 15:00:00",tz="UTC") + (0:10) * 3600
time <- as.POSIXct(c("2020-01-01 15:00:00", "2020-01-01 16:00:00", "2020-01-01 21:00:00"),tz='UTC')
roundtime<-round(time,units="hours")
time %in% datetime # gives TRUES
time %in% roundtime # give FALSES?
time == roundtime # gives TRUES
match(time, datetime) # returns matches
match(time, roundtime) # No matches?
match(roundtime, datetime) # no matches?
This answer expands on my comment to OP's original question.
The problem appears to be down to a silent class change caused by the use of round
:
class(time)
[1] "POSIXct" "POSIXt"
class(datetime)
[1] "POSIXct" "POSIXt"
but
class(roundtime)
[1] "POSIXlt" "POSIXt"
The class change is performed without warning or note and is hidden by the default S3 print
methods:
print(datetime[1])
[1] "2020-01-01 15:00:00 UTC"
print(time[1])
[1] "2020-01-01 15:00:00 UTC"
print(roundtime[1])
[1] "2020-01-01 15:00:00 UTC"
However, it's obvious when the objects are unclass
ed:
print(unclass(datetime[1]))
[1] 1577890800
attr(,"tzone")
[1] "UTC"
print(unclass(time[1]))
[1] 1577890800
attr(,"tzone")
[1] "UTC"
print(unclass(roundtime[1]))
$sec
[1] 0
$min
[1] 0
$hour
[1] 15
$mday
[1] 1
$mon
[1] 0
$year
[1] 120
$wday
[1] 3
$yday
[1] 0
$isdst
[1] 0
attr(,"tzone")
[1] "UTC"
match
appears to fail (although it is actually working correctly) because, as is made clear in its online documentation, "Factors, raw vectors and lists are converted to character vectors, internally classed objects are transformed via mtfrm, and then x and table are coerced to a common type (the later of the two types in R's ordering, logical < integer < numeric < complex < character) before matching. ... Exactly what matches what is to some extent a matter of definition".
The problem can be avoided by using lubridate
(and, as an additional benefit, I think the relevant function names make the programmer's intention far more transparent):
library(lubridate)
roundtime1 <- floor_date(datetime, "hour")
class(roundtime1)
[1] "POSIXct" "POSIXt"
match(time, roundtime1)
[1] 1 2 7
match(roundtime1, datetime)
[1] 1 2 3 4 5 6 7 8 9 10 11
As @r2evans writes, this is a silent, nasty, gotcha that (whilst documented and easily fixed) can lead to unexpected behaviour that is difficult to identify and avoid.