A table EMPLOYEE has below structure with 5 Million rows (10^6).
Name
------
EMPNAME
EMPID
MANAGERID (foreign key to same table)
STATUS
We have a different table EmpAct where we perform insert as below
INSERT INTO empact VALUES
(empName, empid, status)
SELECT empName, empid, status
FROM employee e
WHERE e1.status in (1, 2, 3)
OR
EXISTS (SELECT 1
FROM employee m
WHERE m.empid = e.managerid AND
m.status IN (1,2,3)
)
This becomes a costly operation because for each non active employee (not status 1,2,3) it tries to make an exists run into the same table of 5 Million records O(N^2) operation.
Is there a way to make it a planar O(N) operation?
Also, is the insert into query ok to use or should we use some other PL/SQL construct to make inserts?
This query can be converted to O(N) by re-writing the query to a format that supports hash joins. Although the algorithm analysis gets tricky very quickly, and I'm not sure if the new form will be faster.
create table employee
(
empname varchar2(100),
empid number primary key,
managerid number references employee(empid),
status number
);
create table empact as select empName, empid, status from employee where 1=0;
insert into employee
select level, level, level, 1 from dual connect by level <= 100000;
begin
dbms_stats.gather_table_stats(user, 'EMPLOYEE');
end;
/
explain plan for
INSERT INTO empact
SELECT empName, empid, status
FROM employee e
WHERE e.status in (1, 2, 3)
OR
EXISTS (SELECT 1
FROM employee m
WHERE m.empid = e.managerid AND
m.status IN (1,2,3)
);
select * from table(dbms_xplan.display(format => 'basic'));
Plan hash value: 961581243
------------------------------------------------------
| Id | Operation | Name |
------------------------------------------------------
| 0 | INSERT STATEMENT | |
| 1 | LOAD TABLE CONVENTIONAL | EMPACT |
| 2 | FILTER | |
| 3 | TABLE ACCESS FULL | EMPLOYEE |
| 4 | TABLE ACCESS BY INDEX ROWID| EMPLOYEE |
| 5 | INDEX UNIQUE SCAN | SYS_C0027564 |
------------------------------------------------------
The FILTER
operation is a bit weird, but in this case I believe it's acting like a loop, so we can multiple together operation 3 and 4/5 to find the total run time. The TABLE ACCESS FULL
is O(N) and the INDEX UNIQUE SCAN
would be O(LOG(N)). So you should be seeing O(N*LOG(N)) instead of O(N^2).
If you're seeing two full table scans, that would be O(N^2), but then you should try to investigate why Oracle isn't using the index.
If you want to compare data in O(N) I believe a hash join is the only option. Hash joins only work with equality conditions, and in this case I think Oracle isn't smart enough to understand how to rewrite your query into regular equality conditions. We can do it ourselves by splitting the query into two pieces and UNIONing them together:
explain plan for
INSERT INTO empact
SELECT empName, empid, status
FROM employee e
WHERE e.status in (1, 2, 3)
UNION
SELECT empName, empid, status
FROM employee e
WHERE EXISTS (SELECT 1
FROM employee m
WHERE m.empid = e.managerid AND
m.status IN (1,2,3)
);
select * from table(dbms_xplan.display(format => 'basic'));
Plan hash value: 3147379352
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | INSERT STATEMENT | |
| 1 | LOAD TABLE CONVENTIONAL | EMPACT |
| 2 | SORT UNIQUE | |
| 3 | UNION-ALL | |
| 4 | TABLE ACCESS FULL | EMPLOYEE |
| 5 | HASH JOIN | |
| 6 | TABLE ACCESS FULL | EMPLOYEE |
| 7 | TABLE ACCESS FULL | EMPLOYEE |
---------------------------------------------
The new plan is more complicated but it uses a hash join in operation 5, which in theory can be O(2N). But there's another O(N) FULL TABLE SCAN
on line 4. And there's a O(N*LOG(N)) SORT UNIQUE
on line 2, although hopefully the N is much smaller by that step.
You're thinking about this question the right way - more people should consider the algorithm complexity of their programs. And usually converting a join condition to something that supports hash joins is a good idea. But with this weird join condition I'm not sure if you'll see an improvement. This might be one of the many cases where the constants and implementation details outweigh the complexity.
If you want to learn more, here's a chapter I wrote about using algorithm analysis to understand SQL performance.