rdplyrlagleaddifftime

Calculating difference between two timestamps in R; NA appears alongside time difference output


I am working to calculate the difference in hours between two timestamps in a dataframe. One timestamp is in the previous row i.e. lasttime and the other is in the leading row i.e. origtime. Here is a small example:-

library(dplyr)
df<-structure(list(DateTime_Start = structure(c(1612284131.736, 1612296614.851
), tzone = "UTC", class = c("POSIXct", "POSIXt")), DateTime_End = structure(c(1612284195.736, 
                                                                              1612296715.851), tzone = "UTC", class = c("POSIXct", "POSIXt"
                                                                              ))), row.names = c(NA, -2L), class = "data.frame")

df

#       DateTime_Start        DateTime_End
#1 2021-02-02 16:42:11 2021-02-02 16:43:15
#2 2021-02-02 20:10:14 2021-02-02 20:11:55

lasttime<-lag(df$DateTime_End)
lasttime
#[1] NA                        "2021-02-02 16:43:15 UTC"

origtime<-lead(df$DateTime_Start)
origtime
#[1] "2021-02-02 20:10:14 UTC" NA

You can notice that it returns an NA next to each run of lasttime and origtime, seems to be a place holder for the other timestamp on the row within df. Then when I try to calculate difftime, I get this:-

difftime(origtime,lastime)
#Time differences in secs
#[1] NA NA

I think this is caused by the NA in origtime and lasttime. Can anyone point out a solution to get the difference between these two datetime objects?

Many thanks!


Solution

  • The 1st value for lag and last value for lead is always NA by default. Since there is no previous value of 1st row in lag and no next value of last row in lead.

    You need to change the default value to get value which is not NA. For example, to get 0 as first value you can do -

    library(dplyr)
    
    df %>%
      mutate(difference = difftime(DateTime_Start, lag(DateTime_End, default = first(DateTime_Start)), units = 'hours'))
    
    #       DateTime_Start        DateTime_End     difference
    #1 2021-02-02 16:42:11 2021-02-02 16:43:15 0.000000 hours
    #2 2021-02-02 20:10:14 2021-02-02 20:11:55 3.449754 hours
    

    In the second row you are calculating DateTime_Start[2] - DateTime_End[1]. If you have more than 2 rows in the 3rd row you'll get output for DateTime_Start[3] - DateTime_End[2] and so on.