oracle-databaseperformancestored-proceduresplsqlbulk-operations

how to run the stored procedure in batch mode or in run it in parallel processing


We are iterating 100k+ records from global temporary table.below stored procedure will iterate all records from glogal temp table one by one and has to process below three steps.

  1. to see whether product is exists or not
  2. to see whether product inside the assets are having the 'category' or not.
  3. to see whether the assets are having file names starts with '%pdf%' or not.

So each record has to process these 3 steps and final document names will be stored in the table for the successful record. If any error comes in any of the steps then error message will be stored for that record.

Below stored procedure is taking long time to process Because its processing sequentially.

  1. Is there any way to make this process faster in the stored procedure itself by doing batch process?
  2. If it's not possible in stored procedure then can we change this code into Java and run this code in multi threaded mode? like creating 10 threads and each thread will take one record concurrently and process this code. I would be happy if somebody gives some pseudo code.

which approach is going to suggest?

DECLARE
V_NODE_ID  VARCHAR2(20);
V_FILENAME VARCHAR2(100);
V_CATEGORY_COUNT INTEGER :=0;  
FINAL_FILNAME VARCHAR2(2000);
V_FINAL_ERRORMESSAGE VARCHAR2(2000);


CURSOR C1 IS
SELECT isbn FROM GT_ADD_ISBNS GT;

CURSOR C2(v_isbn in varchar2) IS
SELECT ANP.NODE_ID NODE_ID
        FROM 
        table1 ANP,
        table2 ANPP,
        table3 AN
        WHERE 
      ANP.NODE_ID=AN.ID AND
    ANPP.NODE_ID=ANP.NODE_ID AND
    AN.NAME_ID =26 AND
    ANP.CATEORGY='category' AND
    ANP.QNAME_ID='categories'  AND
        ANP.NODE_ID IN(SELECT CHILD_NODE_ID 
                  FROM TABLE_ASSOC START WITH PARENT_NODE_ID IN(v_isbn) 
                      CONNECT BY PRIOR CHILD_NODE_ID = PARENT_NODE_ID);


BEGIN
--Iterating all Products
FOR R1 IN C1 
LOOP

FINAL_FILNAME :='';
BEGIN


--To check whether Product is exists or not
SELECT AN.ID INTO V_NODE_ID 
FROM TABLE1 AN,
TABLE2 ANP
WHERE
AN.ID=ANP.NODE_ID AND
ANP.VALUE in(R1.ISBN);


V_CATEGORY_COUNT :=0;
V_FINAL_ERRORMESSAGE :='';

--To check Whether Product inside the assets are having the 'category' is applied or not
FOR R2 IN C2(R1.ISBN) 
LOOP

V_CATEGORY_COUNT := V_CATEGORY_COUNT+1;  

BEGIN
--In this Logic Product inside the assets have applied the 'category' But those assets are having documents LIKE '%pdf%' or not
SELECT ANP.STRING_VALUE  into V_FILENAME
        FROM 
        table1 ANP,
        table2 ANPP,
        table3 ACD
        WHERE 
       ANP.QNAME_ID=21  AND 
       ACD.ID=ANPP.LONG_VALUE 
       ANP.NODE_ID=ANPP.NODE_ID AND
       ANPP.QNAME_ID=36 AND
       ANP.STRING_VALUE LIKE '%pdf%'  AND 
       ANP.NODE_ID=R2.NODE_ID; 

    FINAL_FILNAME := FINAL_FILNAME  || V_FILENAME ||',';

   EXCEPTION WHEN
     NO_DATA_FOUND THEN
     V_FINAL_ERRORMESSAGE:=V_FINAL_ERRORMESSAGE|| 'Category is applied for this Product But for the asset:'||  R2.NODE_ID || ':Documents[LIKE %pdf%] were not found ;';
     UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE=  V_FINAL_ERRORMESSAGE  WHERE ISBN= R1.ISBN;


     END;--Iterating for each NODEID

END LOOP;--Iterating the assets[Nodes] for each product of catgeory

  --  DBMS_OUTPUT.PUT_LINE('R1.ISBN:' || R1.ISBN ||'::V_CATEGORY_COUNT:' || V_CATEGORY_COUNT);

 IF(V_CATEGORY_COUNT  = 0) THEN
     UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE=  'Category is not applied to none of the Assets for this Product'  WHERE ISBN= R1.ISBN;
   END IF;  


EXCEPTION WHEN
NO_DATA_FOUND THEN
      UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE=   'Product is not Found:' WHERE ISBN= R1.ISBN;
END;

  -- DBMS_OUTPUT.PUT_LINE( R1.ISBN || 'Final documents:'||FINAL_FILNAME);
      UPDATE GT_ADD_ISBNS SET FILENAME=FINAL_FILNAME WHERE ISBN= R1.ISBN;

COMMIT;
END LOOP;--looping gt_isbns
END;

Solution

  • You have a number of potential performance hits. Here's one:

    "We are iterating 100k+ records from global temporary table"

    Global temporary tables can be pretty slow. Populating them means writing all that data to disk; reading from them means reading from disk. That's a lot of I/O which might be avoidable. Also, GTTs use the temporary tablespace so you may be in contention with other sessions doing large sorts.

    Here's another red flag:

    FOR R1 IN C1 LOOP
    ... FOR R2 IN C2(R1.ISBN) LOOP

    SQL is a set-based language. It is optimised for joining tables and returning sets of data in a highly-performative fashion. Nested cursor loops mean row-by-row processing which is undoubtedly easier to code but may be orders of magnitude slower than the equivalent set operation would be.

    --To check whether Product is exists or not

    You have several queries selecting from the same tables (AN, 'ANP) using the same criteria (isbn`). Perhaps all these duplicates are the only way of validating your business rules but it seems unlikely.

    FINAL_FILNAME := FINAL_FILNAME || V_FILENAME ||',';

    Maybe you could rewrite your query to use listagg() instead of using procedural logic to concatenate a string?

    UPDATE GT_ADD_ISBNS

    Again, all your updates are single row operations instead of set ones.

    "Is there any way to make this process faster in the stored procedure itself by doing batch process?"

    Without knowing your rules and the context we cannot rewrite your logic for you, but 15-16 hours is way too long for this so you can definitely reduce the elapsed time.

    Things to consider:

    1. Replace the writing and reading to the temporary table with the query you use to populate it
    2. Rewrite the loops to use BULK COLLECT with a high LIMIT (e.g. 1000) to improve the select efficiency. Find out more.
    3. Populate arrays and use FORALL to improve the efficiency of the updates. Find out more.
    4. Try to remove all those individual look-ups by incorporating the logic into the main query, using OUTER JOIN syntax to test for existence.

    These are all guesses. If you really want to know where the procedure is spending the time - and that knowledge is the root of all successful tuning, so you ought to want to know - you should run the procedure under a PL/SQL Profiler. This will tell you which lines cost the most time, and those are usually the ones where you need to focus your tuning effort. If you don't already have access to DBMS_PROFILER you will need a DBA to run the install script for you. Find out more.

    " can we change this code into Java and run this code in multi threaded mode?"

    Given that one of the reasons for slowing down the procedure is the I/O cost of selecting from the temporary table there's a good chance multi-threading might introduce further contention and actually make things worse. You should seek to improve the stored procedure first.