I would like to check a table for unexpected data time-gaps, grouped by a column. For example is there a query for an input table:
Datetime, Sym
(timestamp) (symbol)
2024-10-01 08:00:00 AA
2024-10-01 08:00:00 CC
2024-10-01 08:00:00 BB
...
2024-10-01 08:01:00 AA
2024-10-01 08:01:00 BB
2024-10-01 08:01:00 CC
...
2024-10-01 08:02:00 AA
2024-10-01 08:03:00 BB
2024-10-01 08:04:00 CC
which yields result:
Datetime, Sym, Time_gap
2024-10-01 08:04:00 CC 3
2024-10-01 08:03:00 BB 2
2024-10-01 08:02:00 AA 1
...
2024-10-01 08:01:00 AA 1
2024-10-01 08:01:00 BB 1
2024-10-01 08:01:00 CC 1
For each ‘Sym’ the time-gap between consecutive rows is calculated as ‘Time_gap’, with the timestamp of the event, and sorted DESC by ‘Time_gap’.
Note I showed ‘Time_gap’ as an integer (minutes), but any numeric value (seconds, timestamp) would suffice.
I know in other databases you can use the LAG
function, but I see in QuestDB there is no such a function and one can use instead the first_value
window function. My problem is when I try to pass a timestamp to that function I get an error because the timestamp
type is not supported.
We can indeed use the first_value()
window function to get the timestamp from the row before partitioned by Sym, but at the moment we need a workaround, as the function supports just a few data types. I opened a github issue about this, so it might be implemented in the future.
The workaround consists just in casting the timestamp as long (it is an epoch) and then back to a timestamp on a second query. I am showing how to calculate the gap in both seconds and milliseconds below just for completeness, but of course you need only one of them. I am using in one case the datediff
function and in the other just substracting both epochs and dividing to get ms.
WITH time_and_prev AS (
SELECT Datetime, Sym,
first_value(Datetime::long)
OVER (PARTITION BY Sym
ORDER BY Datetime
ROWS BETWEEN 1 PRECEDING
AND 1 PRECEDING
) AS prevTimestamp
FROM table where Datetime IN today()
)
SELECT Datetime, Sym, prevTimestamp::timestamp,
datediff('s', Datetime, prevTimestamp::timestamp) as gap_s,
(Datetime - prevTimestamp::timestamp) / 1000 as gap_ms
FROM time_and_prev
ORDER BY 4 DESC;