I have a huge table (kind of audit log) with these columns:
ID, TS, DATA
ID
is the primary key, and it is a number from a sequence.TS
is a timestamp and it is the current timestamp of the insert.DATA
is the useful data.There is an index on the primary key (ID).
There is a guarantee that if record "B" has a greater ID than record "A" then the TS of "B" will be greater or equal to TS of "A".
My goal is to select the records for a given time interval. The time interval is always very short compared to the total time the table covers (days vs years).
There is no index on TS and I can not create one (explanation: The reason is that the audit table on the production system has a few thousand million rows. The DBAs fear that a new index will unnecesseraly slow down inserts into the log. Actually my query will not be a frequent one and will use only a small part of the table).
If I simply ask
select *
from audit
where TS > 20220101 and TS < 20220102
then a full table scan happens which takes way too much time.
If I find out first the first ID for each day, and get the knowledge that the first ID for 20220101 is 123456 and the first id for 20220102 is 145678 then I can ask
select *
from audit
where ID > 123456 and ID < 145678
which is quick because an index scan happens.
So it is obvious that instead of the full table scan I should somehow find out the first and last ID for the given time period and use them. IT is also obvious that I can find out the IDs quickly via binary search, because of the correlation between the IDs and the TSs. But I don't know how to do this in a SQL query, if it is possible at all.
So is it possible to make use of the ID index for this query? If yes, how?
Is it possible to somehow hint for the DB engine that there is a correlation between ID and TS ?
Below are some options to solve your problem:
SELECT /*+ PARALLEL(X) */ ...
where X is some reasonable number of threads. Be careful not to use too much parallelism. If your DBAs are scared of an index, they probably hate parallelism.alter table audit_table
modify partition by range(ts) interval(numtodsinterval(1, 'DAY'))
(
--Pick the earlest day here:
partition p1 values less than (date '2020-01-01')
);
create materialized zonemap audit_tabl_zm on audit_table(ts);
Binary-search of related index Use one the existing answers, possibly as part of a function or CTE, or possibly you can use the IDs returned and then manually plug them into an existing query. While this works, it's not ideal because you shouldn't have to significantly change your queries to get good performance.
Also, are you sure that the primary key and the timestamp are perfectly synchronized? Using sequences to have any meaning, even an order, can be dangerous. For example, if you have Real Application Clusters, and the sequence was not explicitly set to ORDER, each instance will have a separate cache and the IDs will not be in order.