sqloracle-database

Query to Count Records in a 1-Hour Sliding Window


I have a table where each record represents a transaction by a user. I need to find out if any user has inserted 1000 or more records within a 1 hour time window on a specific date. For example, if a user inserts more than 1000 records between 12:15 PM to 01:15 PM, I want to capture that information.

user_code timestamp
John 2024-12-01 01:17:05
John 2024-12-01 01:17:35
John 2024-12-01 01:18:05
John 2024-12-01 01:18:35
John 2024-12-01 01:19:05
John 2024-12-01 01:19:35
John 2024-12-01 01:20:05
Alex 2024-12-01 01:30:15
Alex 2024-12-01 01:30:45
Alex 2024-12-01 01:31:15
Alex 2024-12-01 01:31:45
Alex 2024-12-01 01:32:15
Alex 2024-12-01 01:32:45
Alex 2024-12-01 01:33:15

What I've Tried:

I've tried following a basic approach with the help of ChatGPT, but the query generated, even by AI, is not working as expected. The sliding window calculation seems to be incorrect in its handling, and I’m not able to capture the data as intended.

Approach:

  1. For each user, calculate a sliding window (1 hour before and 1 hour after each transaction).
  2. Count the number of records within each window.
  3. Return only the windows where the count of records is 1000 or more.

I am looking for help to write the correct Oracle SQL query to implement the above logic. If anyone has any suggestions, or a correct query to achieve this, it would be greatly appreciated!


Solution

  • Use an analytic function with a range window:

    SELECT user_code,
           timestamp
    FROM   (
      SELECT user_code,
             timestamp,
             COUNT(*) OVER (
               PARTITION BY user_code
               ORDER BY timestamp
               RANGE BETWEEN INTERVAL '0' HOUR PRECEDING
                         AND INTERVAL '1' HOUR FOLLOWING
             ) AS cnt
      FROM   table_name
    )
    WHERE  cnt >= 1000
    

    Which, for the sample data:

    CREATE TABLE table_name (user_code, timestamp) AS
    SELECT 'Alice',
            TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
    FROM    DUAL
    CONNECT BY LEVEL <= 1003
    UNION ALL
    SELECT 'Beryl',
            TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
    FROM    DUAL
    CONNECT BY LEVEL <= 500
    UNION ALL
    SELECT 'Beryl',
            TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL + 501)
    FROM    DUAL
    CONNECT BY LEVEL <= 500
    UNION ALL
    SELECT 'Carol',
            TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
    FROM    DUAL
    CONNECT BY LEVEL <= 900
    
    

    Outputs the start of the hour ranges which contain 1000 or more rows:

    USER_CODE TIMESTAMP
    Alice 2024-01-01 00:00:00.000000000
    Alice 2024-01-01 00:00:03.000000000
    Alice 2024-01-01 00:00:06.000000000
    Alice 2024-01-01 00:00:09.000000000
    Beryl 2024-01-01 00:00:00.000000000

    If you want to return the 1-hour ranges then, from Oracle 12, you can use MATCH_RECOGNIZE:

    SELECT *
    FROM   table_name
           MATCH_RECOGNIZE(
             PARTITION BY user_code
             ORDER BY timestamp
             MEASURES
               FIRST(timestamp) AS timestamp_start,
               LAST(timestamp)  AS timestamp_end,
               COUNT(timestamp) AS cnt
             PATTERN (start_row within_1_hour+)
             DEFINE
               within_1_hour AS timestamp <= FIRST(timestamp) + INTERVAL '1' HOUR
           )
    WHERE  cnt >= 1000
    

    Which, for the sample data, outputs:

    USER_CODE TIMESTAMP_START TIMESTAMP_END CNT
    Alice 2024-01-01 00:00:00.000000000 2024-01-01 00:50:06.000000000 1003
    Beryl 2024-01-01 00:00:00.000000000 2024-01-01 00:50:03.000000000 1000

    fiddle