oracle-databasedatetimetimestampwhere-clausesysdate

Dynamically excluding first 5 minutes of data each hour of loading


Data gets fetched from the below "CLAIMANT" table to load into another table. This happens every hour (6 am, 7am, 8am, etc.)

Say I need to write a WHERE CLAUSE that excludes the first 5 minutes of data from being loaded, each time the above operation occurs. The where clause will be added to the overall script that handles the operation

I have tried this, but it excludes 0 records, so it's not detecting the minute by minute differences of the "Filed_Date"

where FILED_DATE not between SYSDATE 
and ( TRUNC( SYSDATE, 'hh24' ) + interval '5' minute )

What is the correct way to write the where clause to exclude the data I need to exclude

enter image description here


Solution

  • Your original predicate has a start time and end time out of sync. If you run the query at 2:00 (sysdate), then the start time for your range is SYSDATE (2:00) and the end time is (TRUNC(SYSDATE,'HH24')+interval '5' minute), or 2:05. If you run the query after 14:05, then your start time will actually be after your end time.

    select to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI') query_date, 
    to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI') start_time,
    (trunc(to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI'),'HH24') + interval '5' minute) stop_time
    from dual
    union
    select to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI') query_date, 
    to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI') start_time,
    (trunc(to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI'),'HH24') + interval '5' minute) stop_time
    from dual;
    

    Results:

    QUERY_DATE           START_TIME           STOP_TIME            
    -------------------- -------------------- -------------------- 
    2021-03-03T14:00:00Z 2021-03-03T14:00:00Z 2021-03-03T14:05:00Z 
    2021-03-03T14:10:00Z 2021-03-03T14:10:00Z 2021-03-03T14:05:00Z 
    

    What you want is the data from 13:05 to 14:00, but the only data you are excluding is data you haven't received yet (after 14:00), so you get everything up to 14:00, or no data at all because of the mismatched start and stop times of your interval.

    Try this, assuming the query runs on or after the top of the hour and you want to capture everything after the first five minutes of the previous hour up to the top of the current hour (i.e. the last 55 minutes of the previous hour):

    where filed_date  
          between (trunc(sysdate,'HH24') - interval '55' minute) -- start time
          and (trunc(sysdate,'HH24'))                            -- end time
    

    Now if the job time (sysdate) is any time between 14:00 and 14:59, then start time for the predicate is always 13:05 and stop time is always 14:00.

    select to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI') query_date,
    (trunc(to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI'),'HH24') - interval '55' minute) start_time,
    (trunc(to_date('2021-03-03 14:00','YYYY-MM-DD HH24:MI'),'HH24')) stop_time
    from dual
    union
    select to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI') query_date,
    (trunc(to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI'),'HH24') - interval '55' minute) start_time,
    (trunc(to_date('2021-03-03 14:10','YYYY-MM-DD HH24:MI'),'HH24')) stop_time
    from dual;
    

    Results:

    QUERY_DATE           START_TIME           STOP_TIME            
    -------------------- -------------------- -------------------- 
    2021-03-03T14:00:00Z 2021-03-03T13:05:00Z 2021-03-03T14:00:00Z 
    2021-03-03T14:10:00Z 2021-03-03T13:05:00Z 2021-03-03T14:00:00Z 
    

    In general it is better to stay with positive assertions (i.e. BETWEEN); negative conditions (i.e. NOT BETWEEN) tend to discourage the CBO from using indexes, if they exist.

    Alternatively, if you ever wanted a larger timeframe and always wanted to exclude the first five minutes of any hour that happened to fall in the range, you could do this:

    where filed_date
          between (sysdate - interval '3' hour) and sysdate
      and to_number(to_char(filed_date,'MI'),'99') > 5
    

    This would limit the minutes portion of filed_date to 6 or more, effectively filtering out the first five minutes of any hour in the range.