rdatetimeposixct

Why does rounding POSIXct change ability to match equivalent POSIXct?


Can someone explain to me why the following POSIXct time elements don't 'match'?

I'm ultimately trying to match the datetime values in vector time to those in vector datetime. I've simplified the example so that everything is the same date and only the hour values differ. In this example, I can use match to find the index position of each time in datetime. However if I need to round my datetimes in time to the nearest hour, given here as roundtime suddenly the match fails? But time==roundtime ? What am I missing here?

datetime <- as.POSIXct("2020-01-01 15:00:00",tz="UTC") + (0:10) * 3600
time <- as.POSIXct(c("2020-01-01 15:00:00", "2020-01-01 16:00:00", "2020-01-01 21:00:00"),tz='UTC')
roundtime<-round(time,units="hours")

time %in% datetime # gives TRUES
time %in% roundtime # give FALSES?
time == roundtime # gives TRUES
match(time, datetime) # returns matches
match(time, roundtime) # No matches?
match(roundtime, datetime) # no matches?

Solution

  • This answer expands on my comment to OP's original question.

    The problem appears to be down to a silent class change caused by the use of round:

    class(time)
    [1] "POSIXct" "POSIXt" 
    class(datetime)
    [1] "POSIXct" "POSIXt" 
    

    but

    class(roundtime)
    [1] "POSIXlt" "POSIXt"
    

    The class change is performed without warning or note and is hidden by the default S3 print methods:

    print(datetime[1])
    [1] "2020-01-01 15:00:00 UTC"
    print(time[1])
    [1] "2020-01-01 15:00:00 UTC"
    print(roundtime[1])
    [1] "2020-01-01 15:00:00 UTC"
    

    However, it's obvious when the objects are unclassed:

    print(unclass(datetime[1]))
    [1] 1577890800
    attr(,"tzone")
    [1] "UTC"
    print(unclass(time[1]))
    [1] 1577890800
    attr(,"tzone")
    [1] "UTC"
    print(unclass(roundtime[1]))
    $sec
    [1] 0
    
    $min
    [1] 0
    
    $hour
    [1] 15
    
    $mday
    [1] 1
    
    $mon
    [1] 0
    
    $year
    [1] 120
    
    $wday
    [1] 3
    
    $yday
    [1] 0
    
    $isdst
    [1] 0
    
    attr(,"tzone")
    [1] "UTC"
    

    match appears to fail (although it is actually working correctly) because, as is made clear in its online documentation, "Factors, raw vectors and lists are converted to character vectors, internally classed objects are transformed via mtfrm, and then x and table are coerced to a common type (the later of the two types in R's ordering, logical < integer < numeric < complex < character) before matching. ... Exactly what matches what is to some extent a matter of definition".

    The problem can be avoided by using lubridate (and, as an additional benefit, I think the relevant function names make the programmer's intention far more transparent):

    library(lubridate)
    roundtime1 <- floor_date(datetime, "hour")
    class(roundtime1)
    [1] "POSIXct" "POSIXt" 
    match(time, roundtime1)
    [1] 1 2 7
    match(roundtime1, datetime)
     [1]  1  2  3  4  5  6  7  8  9 10 11 
    

    As @r2evans writes, this is a silent, nasty, gotcha that (whilst documented and easily fixed) can lead to unexpected behaviour that is difficult to identify and avoid.