Data gets fetched from the below "CLAIMANT" table to load into another table. This happens every hour (6 am, 7am, 8am, etc.)
Say I need to write a WHERE CLAUSE that excludes the first 5 minutes of data from being loaded, each time the above operation occurs. The where clause will be added to the overall script that handles the operation
I have tried this, but it excludes 0 records, so it's not detecting the minute by minute differences of the "Filed_Date"
where FILED_DATE not between SYSDATE
and ( TRUNC( SYSDATE, 'hh24' ) + interval '5' minute )
What is the correct way to write the where clause to exclude the data I need to exclude
Your original predicate has a start time and end time out of sync. If you run the query at 2:00 (sysdate), then the start time for your range is SYSDATE (2:00) and the end time is (TRUNC(SYSDATE,'HH24')+interval '5' minute), or 2:05. If you run the query after 14:05, then your start time will actually be after your end time.
select to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI') query_date,
to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI') start_time,
(trunc(to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI'),'HH24') + interval '5' minute) stop_time
from dual
union
select to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI') query_date,
to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI') start_time,
(trunc(to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI'),'HH24') + interval '5' minute) stop_time
from dual;
Results:
QUERY_DATE START_TIME STOP_TIME
-------------------- -------------------- --------------------
2021-03-03T14:00:00Z 2021-03-03T14:00:00Z 2021-03-03T14:05:00Z
2021-03-03T14:10:00Z 2021-03-03T14:10:00Z 2021-03-03T14:05:00Z
What you want is the data from 13:05 to 14:00, but the only data you are excluding is data you haven't received yet (after 14:00), so you get everything up to 14:00, or no data at all because of the mismatched start and stop times of your interval.
Try this, assuming the query runs on or after the top of the hour and you want to capture everything after the first five minutes of the previous hour up to the top of the current hour (i.e. the last 55 minutes of the previous hour):
where filed_date
between (trunc(sysdate,'HH24') - interval '55' minute) -- start time
and (trunc(sysdate,'HH24')) -- end time
Now if the job time (sysdate) is any time between 14:00 and 14:59, then start time for the predicate is always 13:05 and stop time is always 14:00.
select to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI') query_date,
(trunc(to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI'),'HH24') - interval '55' minute) start_time,
(trunc(to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI'),'HH24')) stop_time
from dual
union
select to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI') query_date,
(trunc(to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI'),'HH24') - interval '55' minute) start_time,
(trunc(to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI'),'HH24')) stop_time
from dual;
Results:
QUERY_DATE START_TIME STOP_TIME
-------------------- -------------------- --------------------
2021-03-03T14:00:00Z 2021-03-03T13:05:00Z 2021-03-03T14:00:00Z
2021-03-03T14:10:00Z 2021-03-03T13:05:00Z 2021-03-03T14:00:00Z
In general it is better to stay with positive assertions (i.e. BETWEEN); negative conditions (i.e. NOT BETWEEN) tend to discourage the CBO from using indexes, if they exist.
Alternatively, if you ever wanted a larger timeframe and always wanted to exclude the first five minutes of any hour that happened to fall in the range, you could do this:
where filed_date
between (sysdate - interval '3' hour) and sysdate
and to_number(to_char(filed_date,'MI'),'99') > 5
This would limit the minutes portion of filed_date to 6 or more, effectively filtering out the first five minutes of any hour in the range.