I have a SQL-Statement which uses a Check-Function
which takes a quite long time for execution.
Now I want to Parallize the execution of the Check-Function
, but it does not work.
Where is the mistake I made?
The sample below takes 5 Seconds to execute, but for my understanding it should take about 1 Second because of the parallelism.
Code Snippet for Testing:
CREATE TABLE PERSON AS
SELECT LEVEL AS ID, 'Person_'||LEVEL AS NAME
FROM DUAL
CONNECT BY LEVEL <= 5;
CREATE OR REPLACE FUNCTION LONGCHECKFUNC(ID NUMBER)
RETURN NUMBER IS
BEGIN
--Doing some very heavy Checks....
DBMS_LOCK.SLEEP(1 /*second*/);
RETURN 1;
END;
SELECT /*+PARALLEL(person, 5) */ *
FROM PERSON
WHERE LONGCHECKFUNC(ID)=1;
CREATE TABLE PERSON
PARTITION BY HASH(ID) PARTITIONS 16 /* <-- ADDED*/
AS
SELECT LEVEL AS ID, 'Person_'||LEVEL AS NAME
FROM DUAL
CONNECT BY LEVEL <= 5;
CREATE OR REPLACE FUNCTION LONGCHECKFUNC(ID NUMBER)
RETURN NUMBER PARALLEL_ENABLE /* <-- ADDED*/ IS
BEGIN
--Doing some very heavy Checks....
DBMS_LOCK.SLEEP(1 /*second*/);
RETURN 1;
END;
/
SELECT /*+PARALLEL(person, 5) */ *
FROM PERSON
WHERE LONGCHECKFUNC(ID)=1;
First, you need to add PARALLEL_ENABLE to your function. This tells Oracle that function calls do not share any session data, and can be run independently.
The second change, adding hash partitioning, I don't fully understand. It depends on the internal algorithms Oracle uses to divide work loads. When there are hash partitions, it's probably easiest to divide the segments among the parallel servers. Without partitioning, there is only a small number of blocks, and Oracle probably assumes it will be fastest to run everything on a single parallel server.
(Even if you use ASSOCIATE STATISTICS
and give the function a ridiculous cost, Oracle will still run it serially. Perhaps there is some limitation where Oracle will never split a block among multiple parallel servers?)
This runs in about 1.1. seconds on my machine. But since it depends on (AFAIK) undocumented behavior, I'm not sure if it will run the same for you. Oracle's hashing functions do not put values in buckets in a simple round-robin fashion. To minimize collisions, and increase the chance of optimal parallelism, you will need to use a large number of partitions.
As @David Aldridge mentioned, parallel query is not really designed for this. If you want a more deterministic process, you'll need something like the solution suggested by @Polppan (use DBMS_SCHEDULER).