I'm trying to find a solution for a case at work that basically is:
We have a database that registers all the api requests and responses. We need to match the number of requests that are charged by the api owner and the number on the database.
We have an id column, but it's not unique, a document number column and the timestamp of when the request was made.
The tricky part is: If we make a request for an specific document, it'll charge for that request, but in the period of 24 hours, if we make the same request for that document, it won't be charged. After that 24h period from the first request, if we make a new request, it'll count as the "first one" again will be charged, so another 24h period starts again and new requests for that document won't be charged during that window frame.
Example:
Id | Document Number | Timestamp |
---|---|---|
id1 | doc1 | 2024-01-01 00:00:00 |
id2 | doc2 | 2024-01-01 00:00:00 |
id3 | doc1 | 2024-01-01 04:30:00 |
id4 | doc1 | 2024-01-02 01:00:00 |
In the example, doc 1 was called 3 times. The first one (id1) will be charged and starts the 24h window span, the second (id2) will also be charge since the doc is not the same. The third one (id3) WON'T be charged since it's still on the window span, but the fourth (id4) will be charged again since it's after the 24h from id1.
I'm trying to think on how to do that, maybe using ranking to sort it by date and document, but since we have that 24h rule, I've not been able to find a solution.
Here's the solution I came up with:
Row Number
based on the order of the DOC_NUM
, REQUEST_DATETIME
and Id
columns.RECURSIVE CTE
to calculate the hours between each document request and add those hours to a running total (RS_HOURS_LAST_REQUEST
column).RS_HOURS_LAST_REQUEST
exceeds 24 hours then reset the running total back to zero.The RS_HOURS_LAST_REQUEST
column will return zero if the row is the first time the document was ever requested, or if 24 hours have passed since the document was first requested.
Important Notes
WHERE a.RS_HOURS_LAST_REQUEST = 0
and analyze the output to further understand how this is working. You can un-comment when you want the final result.Schema (PostgreSQL v15)
CREATE TABLE api_doc_requests (
id text,
doc_num text,
request_datetime timestamp
);
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id1','doc1','2024-01-01 00:00:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id2','doc2','2024-01-01 00:00:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id3','doc1','2024-01-01 04:30:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id4','doc1','2024-01-02 01:00:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id5','doc1','2024-01-01 06:00:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id6','doc1','2024-01-01 10:30:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id7','doc2','2024-01-01 03:30:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id8','doc2','2024-01-01 11:30:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id9','doc1','2024-01-02 06:00:00');
INSERT INTO api_doc_requests (id, doc_num, request_datetime) VALUES ('id10','doc1','2024-01-02 10:30:00');
Query #1
WITH RECURSIVE CTE_DOC_REQUESTS AS
(SELECT A.*
FROM
(SELECT A.ID,
A.DOC_NUM,
A.REQUEST_DATETIME,
ROW_NUMBER() OVER (PARTITION BY A.DOC_NUM
ORDER BY A.REQUEST_DATETIME,
A.ID) AS ROW_NUM,
ROUND(0.00, 2) AS HOURS_LAST_REQUEST,
ROUND(0.00, 2) AS RS_HOURS_LAST_REQUEST
FROM API_DOC_REQUESTS A) A
WHERE A.ROW_NUM = 1
UNION SELECT A.ID,
A.DOC_NUM,
A.REQUEST_DATETIME,
A.ROW_NUM,
ROUND(EXTRACT(EPOCH
FROM A.REQUEST_DATETIME - B.REQUEST_DATETIME) / 3600, 2) AS HOURS_LAST_REQUEST,
CASE
WHEN ROUND(EXTRACT(EPOCH
FROM A.REQUEST_DATETIME - B.REQUEST_DATETIME) / 3600, 2) + B.RS_HOURS_LAST_REQUEST >= 24 THEN 0.00
ELSE ROUND(EXTRACT(EPOCH
FROM A.REQUEST_DATETIME - B.REQUEST_DATETIME) / 3600, 2) + B.RS_HOURS_LAST_REQUEST
END AS RS_HOURS_LAST_REQUEST
FROM
(SELECT A.ID,
A.DOC_NUM,
A.REQUEST_DATETIME,
ROW_NUMBER() OVER (PARTITION BY A.DOC_NUM
ORDER BY A.REQUEST_DATETIME,
A.ID) AS ROW_NUM
FROM API_DOC_REQUESTS A) A
INNER JOIN CTE_DOC_REQUESTS B ON A.DOC_NUM = B.DOC_NUM
AND A.ROW_NUM = B.ROW_NUM + 1)
SELECT *
FROM CTE_DOC_REQUESTS A
WHERE A.RS_HOURS_LAST_REQUEST = 0
ORDER BY A.DOC_NUM,
A.REQUEST_DATETIME,
A.ID;
id | doc_num | request_datetime | row_num | hours_last_request | rs_hours_last_request |
---|---|---|---|---|---|
id1 | doc1 | 2024-01-01T00:00:00.000Z | 1 | 0.00 | 0.00 |
id4 | doc1 | 2024-01-02T01:00:00.000Z | 5 | 14.50 | 0.00 |
id2 | doc2 | 2024-01-01T00:00:00.000Z | 1 | 0.00 | 0.00 |