I have this dataset
library(dplyr)
# creating a dataframe
data_frame <- data.frame(id = c(1,1,2,2,3,3),
col2 = c("start", "finish", "start", "finish","start", "finish"),
col3 = c(as.POSIXct("2021-05-08 08:32:07"),
as.POSIXct("2021-05-08 08:32:45"),
as.POSIXct("2020-11-28 23:32:09"),
as.POSIXct("2020-11-28 23:32:25"),
as.POSIXct("2021-05-08 08:32:07"),
as.POSIXct("2021-05-08 08:32:12")))
# computing difference from start to finish
df <- data_frame %>%
arrange(id) %>%
mutate(timetaken = col3 - lag(col3))
df
I want
To calculate the time difference I can calculate the lag as show above. However, I want to have output in the following way and calculate the average time of that variable.
> df
id col2 col3 timetaken
1 1 start 2021-05-08 08:32:07 NA secs
2 1 finish 2021-05-08 08:32:45 38 secs
3 2 start 2020-11-28 23:32:09 NA secs
4 2 finish 2020-11-28 23:32:25 16 secs
5 3 start 2021-05-08 08:32:07 NA secs
6 3 finish 2021-05-08 08:32:12 5 secs
Any idea how to achieve this?
If you want to use dplyr::lag()
:
library(dplyr)
df <- data_frame |>
mutate(timetaken = col3 - lag(col3), .by = id)
df
# id col2 col3 timetaken
# 1 1 start 2021-05-08 08:32:07 NA secs
# 2 1 finish 2021-05-08 08:32:45 38 secs
# 3 2 start 2020-11-28 23:32:09 NA secs
# 4 2 finish 2020-11-28 23:32:25 16 secs
# 5 3 start 2021-05-08 08:32:07 NA secs
# 6 3 finish 2021-05-08 08:32:12 5 secs
and a dplyr()
approach to get the mean (although, as in the previous answer, I find using base R cleaner in this instance):
summarise(df, mean(timetaken, na.rm = TRUE))
# mean(timetaken, na.rm = TRUE)
# 1 19.66667 secs