sqloracle-databasegroup-bygaps-and-islandsmatch-recognize

SQL Oracle - Group by ID, task ID, min and max timestamp


I have data of users performing different tasks. I would like to group this data per userid and task id to get the start and end times per task. When the employee changes to another task, there should be a new row with new start and end time.

Example simplified data set:

userid taskid date_time_stamp (ascending)
1 task-A 16/6/2021 04:17:00
1 task-A 16/6/2021 04:19:00
1 task-A 16/6/2021 04:27:00
1 task-B 16/6/2021 04:31:00
1 task-B 16/6/2021 04:33:00
1 task-B 16/6/2021 04:36:00
1 task-A 16/6/2021 04:42:00
1 task-A 16/6/2021 04:44:00

example result

userid taskid first_dtm last_dtm
1 task-A 16/6/2021 04:17:00 16/6/2021 04:27:00
1 task-B 16/6/2021 04:31:00 16/6/2021 04:36:00
1 task-A 16/6/2021 04:42:00 16/6/2021 04:44:00

I understand that I should work with some min() and max() functions combined with a GROUP BY. However, grouping by userid and taskid, will result in only one row for task-A in this example.


Solution

  • From Oracle 12, you can use MATCH_RECOGNIZE:

    SELECT *
    FROM   table_name
    MATCH_RECOGNIZE(
      PARTITION BY userid
      ORDER BY date_time_stamp
      MEASURES
        FIRST(taskid) AS taskid,
        FIRST(date_time_stamp) AS start_date,
        LAST(date_time_stamp) AS end_date
      ONE ROW PER MATCH
      PATTERN ( same_task+ )
      DEFINE same_task AS FIRST(taskid) = taskid
    )
    

    Before that, you can use the ROW_NUMBER analytic function and aggregate:

    SELECT userid,
           taskid,
           MIN(date_time_stamp) AS start_date,
           MAX(date_time_stamp) AS end_date
    FROM   (
      SELECT t.*,
             ROW_NUMBER() OVER ( PARTITION BY userid ORDER BY date_time_stamp )
               - ROW_NUMBER() OVER ( PARTITION BY userid, taskid ORDER BY date_time_stamp )
               AS grp
      FROM   table_name t
    )
    GROUP BY userid, taskid, grp
    ORDER BY userid, start_date
    

    Which, for your sample data:

    CREATE TABLE table_name ( userid, taskid, date_time_stamp ) AS
    SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:17:00' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:19:00' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:27:00' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:31:00' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:33:00' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:36:00' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:42:00' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:44:00' HOUR TO SECOND FROM DUAL
    

    Both output:

    USERID TASKID START_DATE END_DATE
    1 task-A 2021-06-16 04:17:00 2021-06-16 04:27:00
    1 task-B 2021-06-16 04:31:00 2021-06-16 04:36:00
    1 task-A 2021-06-16 04:42:00 2021-06-16 04:44:00

    db<>fiddle here