I am working to calculate the difference in hours between two timestamps in a dataframe. One timestamp is in the previous row i.e. lasttime
and the other is in the leading row i.e. origtime
. Here is a small example:-
library(dplyr)
df<-structure(list(DateTime_Start = structure(c(1612284131.736, 1612296614.851
), tzone = "UTC", class = c("POSIXct", "POSIXt")), DateTime_End = structure(c(1612284195.736,
1612296715.851), tzone = "UTC", class = c("POSIXct", "POSIXt"
))), row.names = c(NA, -2L), class = "data.frame")
df
# DateTime_Start DateTime_End
#1 2021-02-02 16:42:11 2021-02-02 16:43:15
#2 2021-02-02 20:10:14 2021-02-02 20:11:55
lasttime<-lag(df$DateTime_End)
lasttime
#[1] NA "2021-02-02 16:43:15 UTC"
origtime<-lead(df$DateTime_Start)
origtime
#[1] "2021-02-02 20:10:14 UTC" NA
You can notice that it returns an NA next to each run of lasttime
and origtime
, seems to be a place holder for the other timestamp on the row within df
. Then when I try to calculate difftime
, I get this:-
difftime(origtime,lastime)
#Time differences in secs
#[1] NA NA
I think this is caused by the NA
in origtime
and lasttime
. Can anyone point out a solution to get the difference between these two datetime objects?
Many thanks!
The 1st value for lag
and last value for lead
is always NA
by default. Since there is no previous value of 1st row in lag
and no next value of last row in lead
.
You need to change the default
value to get value which is not NA
. For example, to get 0 as first value you can do -
library(dplyr)
df %>%
mutate(difference = difftime(DateTime_Start, lag(DateTime_End, default = first(DateTime_Start)), units = 'hours'))
# DateTime_Start DateTime_End difference
#1 2021-02-02 16:42:11 2021-02-02 16:43:15 0.000000 hours
#2 2021-02-02 20:10:14 2021-02-02 20:11:55 3.449754 hours
In the second row you are calculating DateTime_Start[2] - DateTime_End[1]
. If you have more than 2 rows in the 3rd row you'll get output for DateTime_Start[3] - DateTime_End[2]
and so on.