sqloracle-database

SQL Join Over Non-Unique Data - Windowing/Over/Order By/Distinct


I've lurked long enough to know you folks will nail this, please pardon noobness:

Given some sample data:

Table JOBS

JOBNAME JOBCOUNT STEPTOTAL STRTDATE STRTTIME
MIS_JOB_1 23013400 15 20240429 230142
MIS_JOB_1 23013400 15 20240903 230137

Table JOBSTEPS

JOBNAME JOBCOUNT STEPCOUNT SDLDATE SDLTIME
MIS_JOB_1 23013400 1 20240429 230142
MIS_JOB_1 23013400 2 20240429 230143
MIS_JOB_1 23013400 3 20240429 230143
MIS_JOB_1 23013400 4 20240429 230144
MIS_JOB_1 23013400 5 20240429 230145
MIS_JOB_1 23013400 6 20240429 230146
MIS_JOB_1 23013400 7 20240429 230147
MIS_JOB_1 23013400 8 20240429 230147
MIS_JOB_1 23013400 9 20240429 230148
MIS_JOB_1 23013400 10 20240429 230149
MIS_JOB_1 23013400 11 20240429 230150
MIS_JOB_1 23013400 12 20240429 230150
MIS_JOB_1 23013400 13 20240429 230151
MIS_JOB_1 23013400 14 20240429 230152
MIS_JOB_1 23013400 15 20240429 230153
MIS_JOB_1 23013400 1 20240903 230137
MIS_JOB_1 23013400 2 20240903 230138
MIS_JOB_1 23013400 3 20240903 230139
MIS_JOB_1 23013400 4 20240903 230140
MIS_JOB_1 23013400 5 20240903 230140
MIS_JOB_1 23013400 6 20240903 230141
MIS_JOB_1 23013400 7 20240903 230142
MIS_JOB_1 23013400 8 20240903 230143
MIS_JOB_1 23013400 9 20240903 230144
MIS_JOB_1 23013400 10 20240903 230145
MIS_JOB_1 23013400 11 20240903 230146
MIS_JOB_1 23013400 12 20240903 230147
MIS_JOB_1 23013400 13 20240903 230148
MIS_JOB_1 23013400 14 20240903 230148
MIS_JOB_1 23013400 15 20240903 230149
select
a.JOBNAME,
a.JOBCOUNT,
a.STEPTOTAL,
b.STEPCOUNT,
b.PROGNAME,
b.EXTCMD,
b.SDLDATE,
b.SDLTIME,
a.SDLUNAME,
a.STRTDATE,
a.STRTTIME
from JOBS a inner join JOBSTEPS b on a.JOBNAME = b.JOBNAME and a.JOBCOUNT = b.JOBCOUNT where b.SDLDATE = a.STRTDATE

Need better join to perhaps involve a.STRTDATE, b.SDLDATE, a.STEPTOTAL, b.STEPCOUNT

I have tried joins like above by pushing SDLDATE out: however there are cases when JOBNAME, JOBCOUNT can be re-used again the next day - and other cases when SDLDATE for a given set of JOBSTEPS can span several days. So I either lose data or join too much to earlier JOBS records.


Solution

  • Provided you do not have overlapping sequences of steps, from Oracle 12, you can use MATCH_RECOGNIZE for row-by-row pattern patching to find the steps within a sequence and set the start date for each set of steps (even if there are multiple on the same day or if they go over multiple days). Once you have found that then you can use it to join the two tables:

    SELECT j.*,
           MAX(s.sdldate) OVER (PARTITION BY s.jobname, s.jobcount, s.match) AS enddate,
           s.stepcount,
           s.sdldate
    FROM   jobsteps
           MATCH_RECOGNIZE(
             PARTITION BY jobname, jobcount
             ORDER BY sdldate
             MEASURES
               step_one.sdldate AS strtdate,
               match_number()   AS match
             ALL ROWS PER MATCH
             PATTERN ( step_one next_step* )
             DEFINE
               step_one AS stepcount = 1,
               next_step AS stepcount = PREV(stepcount) + 1
           ) s
           INNER JOIN jobs j
           ON (   j.jobname   = s.jobname
              AND j.jobcount  = s.jobcount
              AND j.strtdate  = s.strtdate
              )
    ORDER BY
           j.jobname, j.jobcount, s.sdldate
    

    Which, for the sample data:

    CREATE TABLE jobs (jobname, jobcount, steptotal, strtdate) AS
    SELECT 'MIS_JOB_1', 23013400, 5, DATE '2024-04-29' + INTERVAL '23:01:42' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 'MIS_JOB_1', 23013400, 5, DATE '2024-09-03' + INTERVAL '23:59:58' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 'MIS_JOB_1', 23013400, 3, DATE '2024-09-04' + INTERVAL '01:00:00' HOUR TO SECOND FROM DUAL;
    
    CREATE TABLE jobsteps (jobname, jobcount, stepcount, sdldate) AS
    SELECT 'MIS_JOB_1', 23013400,  1,  DATE '2024-04-29' + INTERVAL '23:01:42' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  2,  DATE '2024-04-29' + INTERVAL '23:01:43' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  3,  DATE '2024-04-29' + INTERVAL '23:01:43' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  4,  DATE '2024-04-29' + INTERVAL '23:01:44' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  5,  DATE '2024-04-29' + INTERVAL '23:01:45' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  1,  DATE '2024-09-03' + INTERVAL '23:59:58' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  2,  DATE '2024-09-03' + INTERVAL '23:59:59' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  3,  DATE '2024-09-03' + INTERVAL '23:59:59' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  4,  DATE '2024-09-04' + INTERVAL '00:00:00' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  5,  DATE '2024-09-04' + INTERVAL '00:00:01' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT 'MIS_JOB_1', 23013400,  1,  DATE '2024-09-04' + INTERVAL '01:00:00' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  2,  DATE '2024-09-04' + INTERVAL '01:00:01' HOUR TO SECOND FROM DUAL UNION ALL 
    SELECT 'MIS_JOB_1', 23013400,  3,  DATE '2024-09-04' + INTERVAL '01:00:02' HOUR TO SECOND FROM DUAL
    

    Note: A DATE data-type has both date and time components. Don't store dates and times in separate columns (and especially not as strings), instead use a single DATE column containing both components. If you must store the components as two columns (please don't) then use TO_DATE(date_column || time_column, 'yyyymmddhh24miss') to convert them to a date and then use the query above).

    Outputs:

    JOBNAME JOBCOUNT STEPTOTAL STRTDATE ENDDATE STEPCOUNT SDLDATE
    MIS_JOB_1 23013400 5 2024-04-29 23:01:42 2024-04-29 23:01:45 1 2024-04-29 23:01:42
    MIS_JOB_1 23013400 5 2024-04-29 23:01:42 2024-04-29 23:01:45 2 2024-04-29 23:01:43
    MIS_JOB_1 23013400 5 2024-04-29 23:01:42 2024-04-29 23:01:45 3 2024-04-29 23:01:43
    MIS_JOB_1 23013400 5 2024-04-29 23:01:42 2024-04-29 23:01:45 4 2024-04-29 23:01:44
    MIS_JOB_1 23013400 5 2024-04-29 23:01:42 2024-04-29 23:01:45 5 2024-04-29 23:01:45
    MIS_JOB_1 23013400 5 2024-09-03 23:59:58 2024-09-04 00:00:01 1 2024-09-03 23:59:58
    MIS_JOB_1 23013400 5 2024-09-03 23:59:58 2024-09-04 00:00:01 2 2024-09-03 23:59:59
    MIS_JOB_1 23013400 5 2024-09-03 23:59:58 2024-09-04 00:00:01 3 2024-09-03 23:59:59
    MIS_JOB_1 23013400 5 2024-09-03 23:59:58 2024-09-04 00:00:01 4 2024-09-04 00:00:00
    MIS_JOB_1 23013400 5 2024-09-03 23:59:58 2024-09-04 00:00:01 5 2024-09-04 00:00:01
    MIS_JOB_1 23013400 3 2024-09-04 01:00:00 2024-09-04 01:00:02 1 2024-09-04 01:00:00
    MIS_JOB_1 23013400 3 2024-09-04 01:00:00 2024-09-04 01:00:02 2 2024-09-04 01:00:01
    MIS_JOB_1 23013400 3 2024-09-04 01:00:00 2024-09-04 01:00:02 3 2024-09-04 01:00:02

    fiddle