I have data of users performing different tasks. I would like to group this data per userid and task id to get the start and end times per task. When the employee changes to another task, there should be a new row with new start and end time.
Example simplified data set:
userid | taskid | date_time_stamp (ascending) |
---|---|---|
1 | task-A | 16/6/2021 04:17:00 |
1 | task-A | 16/6/2021 04:19:00 |
1 | task-A | 16/6/2021 04:27:00 |
1 | task-B | 16/6/2021 04:31:00 |
1 | task-B | 16/6/2021 04:33:00 |
1 | task-B | 16/6/2021 04:36:00 |
1 | task-A | 16/6/2021 04:42:00 |
1 | task-A | 16/6/2021 04:44:00 |
example result
userid | taskid | first_dtm | last_dtm |
---|---|---|---|
1 | task-A | 16/6/2021 04:17:00 | 16/6/2021 04:27:00 |
1 | task-B | 16/6/2021 04:31:00 | 16/6/2021 04:36:00 |
1 | task-A | 16/6/2021 04:42:00 | 16/6/2021 04:44:00 |
I understand that I should work with some min() and max() functions combined with a GROUP BY. However, grouping by userid and taskid, will result in only one row for task-A in this example.
From Oracle 12, you can use MATCH_RECOGNIZE
:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY userid
ORDER BY date_time_stamp
MEASURES
FIRST(taskid) AS taskid,
FIRST(date_time_stamp) AS start_date,
LAST(date_time_stamp) AS end_date
ONE ROW PER MATCH
PATTERN ( same_task+ )
DEFINE same_task AS FIRST(taskid) = taskid
)
Before that, you can use the ROW_NUMBER
analytic function and aggregate:
SELECT userid,
taskid,
MIN(date_time_stamp) AS start_date,
MAX(date_time_stamp) AS end_date
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY userid ORDER BY date_time_stamp )
- ROW_NUMBER() OVER ( PARTITION BY userid, taskid ORDER BY date_time_stamp )
AS grp
FROM table_name t
)
GROUP BY userid, taskid, grp
ORDER BY userid, start_date
Which, for your sample data:
CREATE TABLE table_name ( userid, taskid, date_time_stamp ) AS
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:17:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:19:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:27:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:31:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:33:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:36:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:42:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:44:00' HOUR TO SECOND FROM DUAL
Both output:
USERID TASKID START_DATE END_DATE 1 task-A 2021-06-16 04:17:00 2021-06-16 04:27:00 1 task-B 2021-06-16 04:31:00 2021-06-16 04:36:00 1 task-A 2021-06-16 04:42:00 2021-06-16 04:44:00
db<>fiddle here