oracle-databaseexistscorrelated-subquery

Oracle - Make a correlated query more efficient


Oracle RDBMS :18C

This topic is subsequent to my previous thread:
Oracle - Where clause having null value condition does not give corresponding result

Because the last question is solved (to obtain correct results), I closed it.
This is an another issue,regarding performance, so I raise this new question.
To make my question clearer,please read through above link first.

The Impaler's answer to last post using decode is just what I need.
But later I found that query is not very efficient because my outer table is much bigger than the inner table, which maybe the cause slowing down the query (compare to in operator, see speed test table below)

Due to proprietary concern, I can't upload my atual working table.
That being said, let me take below small sample table to explain my actual working table:
db<>fiddle

With Desired_Column_Combination as (
  Select 'A1' ColA,'B2' ColB,'C2' ColC from dual union all 
  Select 'A1' ColA,'B1' ColB,'' ColC from dual union all --Should've 3 records
  Select 'A2' ColA,'' ColB,'C3' ColC from dual union all --Should've 2 records
  Select 'A3' ColA,'B1' ColB,'C1' ColC from dual
)
select * 
from SampleTable S
where exists (
  select 1 from Desired_Column_Combination d
  where decode(s.ColA, d.ColA, 1, 0) = 1
    and decode(s.ColB, d.ColB, 1, 0) = 1
    and decode(s.ColC, d.ColC, 1, 0) = 1
);

In my atual working table:
<1>. Desired_Column_Combination is of user's choice and the amount of condition may up to 70 select ... from.... I use VBA to make a dynamic Desired_Column_Combination table based on user's choice.
<2>. All condition in Desired_Column_Combination is 100% from SampleTable, so there's at least 1 record in SampleTable corresponds to it in Desired_Column_Combination Table. In other words, Desired_Column_Combination table is a subset of SampleTable.
<3>. SampleTable have indices on ColA,B and C.
<4>. I only have query table priviledge, can't delete/update/alter table.

I then tried to understand exists operator, hoping solve it by myself.
Here's a brief conclusion after a few hours of gooling:
Generally, if outer table is much larger than inner table, then using in operator is better. On the other hand, if inner table contains more rows than outer table and inner table have indices, then using exists is faster than in.

Then I do some speed test on my working table: Speed Test

It seems the test results is the same as my google conclusion, because my outer table, SampleTable, is much larger than inner table, Desired_Column_Combination, so exists operator gives worse performance than in operator.

My question is:
<1>. Is there a way to use in operator while not expand where-clause too much? (To my best knowledge, in order to use in under null value query, we need to specify every column to be null in where-clause whenever Desired_Column_Combination has a null value condition)
<2>. If exists is the only answer, can we re-write the query such that outer table becomes inner table and vice versa to make exists runs faster?


Solution

  • The trouble with the DECODE solution is that it cannot be indexed because the function has inputs from two different tables. If it's simply NULL-handling and you want the benefit of indexes, I suggest a simple NVL function and a matching index.

    select * 
    from SampleTable S
    where exists (
      select 1 from Desired_Column_Combination d
      where NVL(s.ColA,' ') = NVL(d.ColA,' ')
        and NVL(s.ColB,' ') = NVL(d.ColB,' ')
        and NVL(s.ColC,' ') = NVL(d.ColC,' ')
    );
    

    Now a matching index, and it's best to include all three columns when doing so:

    CREATE OR REPLACE INDEX index1 ON Desired_Column_Combination 
    (NVL(colA,' '),NVL(colB,' '),NVL(colC,' '))
    

    FYI, the reason for Oracle's default NULL logic is that NULL isn't intended to be a meaningful value that contributes to a row's identity. It is intended rather to mean "unknown" or nothing meaningful - it denotes the lack of anything meaningful, so normally you wouldn't want to find child rows if the join key columns have a NULL value in them. Such a row's identity would be undefined, unknowable, and uncertain. There are of course exceptions (e.g. when metadata is part of the key, like in type-2 history tables where you don't want to use an imaginary end date), and so Oracle allows it, but you have to do a bit of extra work to get them to be treated like meaningful key values that are part of a row's identity. NVL is the easiest way to do that, but applying a function to the column throws out index use unless you create a corresponding function-based index to match. But often when you find yourself in this situation there's something wrong with your data model, where NULL is being used in a matter it wasn't intended for.