rfloating-pointtimestamplubridatenanotime

how to safely store millisecond differences between timestamps?


This is some hellish question related to floating-point approximations and timestamps in R. Get ready :) Consider this simple example:

library(tibble)
library(lubridate)
library(dplyr)

tibble(timestamp_chr1 = c('2014-01-02 01:35:50.858'),
       timestamp_chr2 = c('2014-01-02 01:35:50.800')) %>% 
  mutate(time1 = lubridate::ymd_hms(timestamp_chr1),
         time2 = lubridate::ymd_hms(timestamp_chr2),
         timediff = as.numeric(time1 - time2))


# A tibble: 1 x 5
  timestamp_chr1          timestamp_chr2          time1                      time2                       timediff
  <chr>                   <chr>                   <dttm>                     <dttm>                         <dbl>
1 2014-01-02 01:35:50.858 2014-01-02 01:35:50.800 2014-01-02 01:35:50.858000 2014-01-02 01:35:50.799999 0.0580001

Here the time difference between the two timestasmps is obviously 58 milliseconds, yet R stores that with some floating-point approximation so that it appears as 0.058001 seconds.

What is the safest way to get exactly 58 milliseconds as an asnwer instead? I thought about using as.integer (instead of as.numeric) but I am worried about some loss of information. What can be done here?

Thanks!


Solution

  • Some considerations, some I think you already know:

    If you're concerned about introducing errors in the data, though, an alternative is to encode as milliseconds (instead of the R norm of seconds). If you can choose an arbitrary and recent (under 24 days) reference point, then you can do it with normal integer, but if that is insufficient or you prefer to use epoch milliseconds, then you need to jump to 64-bit integers, perhaps with bit64.

    now <- Sys.time()
    as.integer(now)
    # [1] 1583507603
    as.integer(as.numeric(now) * 1000)
    # Warning: NAs introduced by coercion to integer range
    # [1] NA
    bit64::as.integer64(as.numeric(now) * 1000)
    # integer64
    # [1] 1583507603439