I have a table where each record represents a transaction by a user. I need to find out if any user has inserted 1000 or more records within a 1 hour time window on a specific date. For example, if a user inserts more than 1000 records between 12:15 PM to 01:15 PM, I want to capture that information.
user_code | timestamp |
---|---|
John | 2024-12-01 01:17:05 |
John | 2024-12-01 01:17:35 |
John | 2024-12-01 01:18:05 |
John | 2024-12-01 01:18:35 |
John | 2024-12-01 01:19:05 |
John | 2024-12-01 01:19:35 |
John | 2024-12-01 01:20:05 |
Alex | 2024-12-01 01:30:15 |
Alex | 2024-12-01 01:30:45 |
Alex | 2024-12-01 01:31:15 |
Alex | 2024-12-01 01:31:45 |
Alex | 2024-12-01 01:32:15 |
Alex | 2024-12-01 01:32:45 |
Alex | 2024-12-01 01:33:15 |
What I've Tried:
I've tried following a basic approach with the help of ChatGPT, but the query generated, even by AI, is not working as expected. The sliding window calculation seems to be incorrect in its handling, and I’m not able to capture the data as intended.
Approach:
I am looking for help to write the correct Oracle SQL query to implement the above logic. If anyone has any suggestions, or a correct query to achieve this, it would be greatly appreciated!
Use an analytic function with a range window:
SELECT user_code,
timestamp
FROM (
SELECT user_code,
timestamp,
COUNT(*) OVER (
PARTITION BY user_code
ORDER BY timestamp
RANGE BETWEEN INTERVAL '0' HOUR PRECEDING
AND INTERVAL '1' HOUR FOLLOWING
) AS cnt
FROM table_name
)
WHERE cnt >= 1000
Which, for the sample data:
CREATE TABLE table_name (user_code, timestamp) AS
SELECT 'Alice',
TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
FROM DUAL
CONNECT BY LEVEL <= 1003
UNION ALL
SELECT 'Beryl',
TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
FROM DUAL
CONNECT BY LEVEL <= 500
UNION ALL
SELECT 'Beryl',
TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL + 501)
FROM DUAL
CONNECT BY LEVEL <= 500
UNION ALL
SELECT 'Carol',
TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
FROM DUAL
CONNECT BY LEVEL <= 900
Outputs the start of the hour ranges which contain 1000 or more rows:
USER_CODE | TIMESTAMP |
---|---|
Alice | 2024-01-01 00:00:00.000000000 |
Alice | 2024-01-01 00:00:03.000000000 |
Alice | 2024-01-01 00:00:06.000000000 |
Alice | 2024-01-01 00:00:09.000000000 |
Beryl | 2024-01-01 00:00:00.000000000 |
If you want to return the 1-hour ranges then, from Oracle 12, you can use MATCH_RECOGNIZE
:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY user_code
ORDER BY timestamp
MEASURES
FIRST(timestamp) AS timestamp_start,
LAST(timestamp) AS timestamp_end,
COUNT(timestamp) AS cnt
PATTERN (start_row within_1_hour+)
DEFINE
within_1_hour AS timestamp <= FIRST(timestamp) + INTERVAL '1' HOUR
)
WHERE cnt >= 1000
Which, for the sample data, outputs:
USER_CODE | TIMESTAMP_START | TIMESTAMP_END | CNT |
---|---|---|---|
Alice | 2024-01-01 00:00:00.000000000 | 2024-01-01 00:50:06.000000000 | 1003 |
Beryl | 2024-01-01 00:00:00.000000000 | 2024-01-01 00:50:03.000000000 | 1000 |