sqlmysqlrunning-count

mysql: get running count over time based on start and end timestamps


I have a workflows table with columns (processID, started_at, ended_at)

How can I build running counts of actively running process IDs per a given timestamp as a timeseries from data tabulated below:

Table of process timestamps:

id      started_at              ended_at
------- --------------------    --------------------
1203914 2023-04-20T04:54:29Z    2023-04-20T20:43:53Z
1197674 2023-04-20T06:00:28Z    2023-04-20T21:17:53Z
1212050 2023-04-20T18:47:29Z    0001-01-01T00:00:00Z
1198434 2023-04-22T18:16:53Z    2023-04-22T19:02:59Z
1210450 2023-04-22T19:06:53Z    2023-04-26T03:23:39Z
1210466 2023-04-23T05:34:53Z    2023-04-25T07:09:39Z
1201986 2023-04-24T06:30:53Z    2023-04-24T23:49:53Z
1200122 2023-04-24T17:22:53Z    2023-04-25T05:29:39Z
1209114 2023-04-25T01:07:53Z    2023-04-26T23:03:39Z
1198570 2023-04-25T01:10:53Z    2023-04-27T00:59:38Z

expected running process list:

timestamp               running_process_count
--------------------    ---------------------
2023-04-20T04:54:29Z    1
2023-04-20T06:00:28Z    2
2023-04-20T18:47:29Z    3
2023-04-22T18:16:53Z    1
2023-04-22T19:06:53Z    1
2023-04-23T05:34:53Z    2
2023-04-24T06:30:53Z    3
2023-04-24T17:22:53Z    4
2023-04-25T01:07:53Z    4

I'm looking for something similar to how it's done in:

R- Calculate a count of items over time using start and end dates

I can get counts of process IDs for a particular HOUR by using the following query, however what I'm looking for is "running" process count per timestamp (can be started_at) where we display count of processes that have started_at < timestamp < ended_at.

Do I need to use MySQL windowing functions to achieve this? (lag, lead, partition etc) - apologize as I'm not familiar with advanced MySQL operators.

What I have so far:

SELECT   
  started_at,
  count(*) AS running_count
FROM workflows
GROUP BY 
  YEAR(started_at),
  MONTH(started_at),
  DAY(started_at),
  HOUR(started_at)
ORDER BY 
  YEAR(started_at),
  MONTH(started_at),
  DAY(started_at),
  HOUR(started_at);

Solution

  • Do a self-join and aggregate as the following:

    select t1.started_at,
      count(t2.id) cnt
    from workflows t1 left join workflows t2
    on t1.started_at between t2.started_at and t2.ended_at
    group by t1.started_at
    order by t1.started_at
    

    See a demo