To assess how much time is needed to answer a customer's query, I have the following data frame (df1) with columns ID, Task, Date_time
, where ID = customer number
, Task = Task done by the employee related to the query
, and Date_time = POSIXct column to specify when the task was conducted
.
For each customer, I want to find the duration in minutes between Task == "New"
and Task == "Closed"
.
To calculate duration in minutes, I have to consider that:
There could be multiple times that Task == "Closed" appears, therefore only the last Task == "Closed"
is to be considered in the calculation.
The working hours is from 8 am to 5 pm CET (8:00 to 17:00) Monday to Friday. The duration has to exclude non-working hours (5 pm to 8 am) and weekends (Saturday and Sunday).
Can someone please suggest how to calculate the duration, considering the above points? Thanks!
The data frame looks like this:
ID Task Date_time
1 customer1 New 2022-11-09 15:33:32
2 customer1 Edit 2022-11-09 15:38:40
4 customer1 Answered 2022-11-09 15:44:44
5 customer1 FeedbackRequired 2022-11-11 08:02:51
6 customer1 Closed 2022-11-17 15:04:23
8 customer2 New 2022-04-11 13:55:22
9 customer2 Edit 2022-04-11 13:59:53
11 customer2 Answered 2022-05-11 11:17:15
12 customer2 FeedbackRequired 2022-05-11 11:17:41
13 customer2 Closed 2022-08-17 13:23:29
15 customer2 Closed 2022-08-17 13:24:24
17 customer2 Closed 2022-08-17 13:32:41
Here is an example data frame:
df1 <- structure(list(ID = c("customer1", "customer1", "customer1",
"customer1", "customer1", "customer2", "customer2", "customer2",
"customer2", "customer2", "customer2", "customer2", "customer5",
"customer5", "customer5", "customer5", "customer5", "customer3",
"customer3", "customer3", "customer3", "customer3", "customer3",
"customer3", "customer3", "customer3", "customer3", "customer4",
"customer4", "customer4", "customer4", "customer4"), Task = c("New",
"Edit", "Answered", "FeedbackRequired", "Closed", "New", "Edit",
"Answered", "FeedbackRequired", "Closed", "Closed", "Closed",
"New", "Edit", "Answered", "FeedbackRequired", "Closed", "New",
"Edit", "HubAdded", "Answered", "FeedbackRequired", "Closed",
"Closed", "Closed", "Closed", "Closed", "New", "Edit", "Answered",
"FeedbackRequired", "Closed"), Date_time = structure(c(1668008012.93733,
1668008320.29733, 1668008684.57472, 1668153771.45687, 1668697463.01071,
1649685322.67473, 1649685593.46752, 1652267835.13924, 1652267861.07935,
1660742609.41271, 1660742664.11297, 1660743161.80927, 1678295469.58648,
1678295749.33997, 1678359922.0184, 1678787443.43049, 1680703787.10976,
1661514257.02831, 1661514383.23061, 1661526698.41032, 1661527095.83771,
1661527117.512, 1662457363.51916, 1662457378.0676, 1662457519.11092,
1663232439.58358, 1663246649.3237, 1680252406.63738, 1680253548.17636,
1680254179.34628, 1680254196.74463, 1680257109.1508), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(1L, 2L, 4L, 5L, 6L,
8L, 9L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 21L, 22L, 23L, 65L,
66L, 68L, 69L, 70L, 71L, 73L, 75L, 77L, 79L, 994L, 995L, 997L,
998L, 999L), class = "data.frame")
If Task == "New"
appears for each ID
only once, e.g.
> length(unique(df1[df1$Task == "New", "ID"])) == length(unique(df1$ID))
[1] TRUE
the following approach might work. businessDuration()
from {BusinessDuration}
comes in handy to calculate the difftime w.r.t. business hours. It seems like businessDuration()
is not vectorized, we therefore use Vectorize()
:
xyzzy =
merge(x = df1[df1$Task == "New", c("ID", "Date_time")],
y = with(df1[df1$Task == "Closed", ],
aggregate(list(Date_time = Date_time), list(ID = ID),
max, drop = FALSE)),
by = "ID") |>
`colnames<-`(c("ID", "New", "Closed")) # cosmetics
library(BusinessDuration)
f = Vectorize(\(x, y)
businessDuration(startdate = x, enddate = y,
starttime = "08:00:00", endtime = "17:00:00",
unit = "min"))
within(xyzzy, {wmin = f(x = New, y = Closed)})
gives
ID New Closed wmin
1 customer1 2022-11-09 15:33:32 2022-11-17 15:04:23 3210.85000
2 customer2 2022-04-11 13:55:22 2022-08-17 13:32:41 49657.31667
3 customer3 2022-08-26 11:44:17 2022-09-15 12:57:29 7633.20000
4 customer4 2023-03-31 08:46:46 2023-03-31 10:05:09 78.38333
5 customer5 2023-03-08 17:11:09 2023-04-05 14:09:47 10749.78333
Have a look at the help file (?businessDuration
) for further details, e.g.
weekendlist
Custom weekend list. Default is "Saturday" & "Sunday"
holidaylist
Custom holiday list. Default is NULL