sqlmariadbwindow-functionsmariadb-10.5

MySQL Nested aggregation query with selection of specific intermediate items


This is my activities table.

activities
+----+---------+----------+-----------------+
| id | user_id | activity |    log_time     |
+----+---------+----------+-----------------+
|  6 |       1 | start    | 12 Oct, 1000hrs |
|  2 |       1 | task     | 12 Oct, 1010hrs |
|  7 |       1 | task     | 12 Oct, 1040hrs |
|  3 |       1 | start    | 12 Oct, 1600hrs |
|  1 |       1 | task     | 12 Oct, 1610hrs |
|  9 |       1 | start    | 14 Oct, 0800hrs |
| 10 |       1 | start    | 16 Oct, 0900hrs |
|  4 |       1 | task     | 16 Oct, 0910hrs |
|  8 |       2 | start    | 12 Oct, 1000hrs |
|  5 |       2 | task     | 12 Oct, 1020hrs |
+----+---------+----------+-----------------+

and I need the total time spent by the user across all sessions. Each session happens within a day and includes a 'start' and multiple 'tasks' (before the next session is initiated with a 'start'). A session duration = last task - start [the timestamp difference]

output
+---------+------------+------------------------------------------------+
| user_id | total_time |       This is explanation (not a column)       |
+---------+------------+------------------------------------------------+
|       1 |         60 | 12_Oct[40+10] + 14_Oct[0] + 16_Oct[10] = 60min |
|       2 |         20 | 12_Oct[20]  = 20min                            |
+---------+------------+------------------------------------------------+

I am unable to figure out how to get the last task in a session. I have tried the basic aggregation and join queries - but it doesn't work.

As an approach, what I think I really need is to get the last column (below / session_group) somehow, and then I can aggregate and get the difference between max/min timestamp.

+----+---------+----------+-----------------+---------------+
| id | user_id | activity |    log_time     | session_group |
+----+---------+----------+-----------------+---------------+
|  6 |       1 | start    | 12 Oct, 1000hrs |             1 |
|  2 |       1 | task     | 12 Oct, 1010hrs |             1 |
|  7 |       1 | task     | 12 Oct, 1040hrs |             1 |
|  3 |       1 | start    | 12 Oct, 1600hrs |             2 |
|  1 |       1 | task     | 12 Oct, 1610hrs |             2 |
|  9 |       1 | start    | 14 Oct, 0800hrs |             3 |
| 10 |       1 | start    | 16 Oct, 0900hrs |             4 |
|  4 |       1 | task     | 16 Oct, 0910hrs |             4 |
|  8 |       2 | start    | 12 Oct, 1000hrs |             5 |
|  5 |       2 | task     | 12 Oct, 1020hrs |             5 |
+----+---------+----------+-----------------+---------------+

Please let me know if it is even possible to get the desired output via sql (MySQL) and how to go about it ? Or is it necessary to loop through the data via say Javascript ?

Below is the MySQL query for the tables:

create table activities (
  id INT NOT NULL, 
  user_id INT NULL, 
  activity VARCHAR(45), 
  log_time DATETIME NOT NULL DEFAULT NOW(),
  PRIMARY KEY(id))
 ENGINE = InnoDB;    
 
insert into activities
    (id, user_id, activity, log_time) 
values
    (6,1,'start', '2021-10-12 10:00:00'), 
    (2,1,'task' , '2021-10-12 10:10:00'), 
    (7,1,'task' , '2021-10-12 10:40:00'), 
    (3,1,'start', '2021-10-12 16:00:00'), 
    (1,1,'task',  '2021-10-12 16:10:00'), 
    (9,1,'task',  '2021-10-14 08:00:00'), 
    (10,1,'start','2021-10-16 09:00:00'), 
    (4,1,'task',  '2021-10-16 09:10:00'), 
    (8,2,'start', '2021-10-12 10:00:00'), 
    (5,2,'task',  '2021-10-12 10:20:00');

Solution

  • You can use SUM() window function to assign a number to each session and then aggregate:

    SELECT DISTINCT user_id,
           SUM(TIMESTAMPDIFF(MINUTE, MIN(log_time), MAX(log_time))) OVER (PARTITION BY user_id) total_time 
    FROM (
      SELECT *, SUM(activity = 'start') OVER (PARTITION BY user_id, DATE(log_time) ORDER BY log_time) grp
      FROM activities
    ) t
    WHERE grp > 0
    GROUP BY user_id, DATE(log_time), grp;
    

    See the demo.