sqldatabaseoracle-databasecost-based-optimizeranchor-modeling

Join elimination not working in Oracle with sub queries


I am able to get join elimination to work for simple cases such as one-to-one relations, but not for slightly more complicated scenarios. Ultimately I want to try anchor modelling, but first I need to find a way around this problem. I'm using Oracle 12c Enterprise Edition Release 12.1.0.2.0.

DDL for my test case:

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk primary key(product_id, from_date)
);

Some example data:

insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);

insert into product_price values(2, date '2016-02-01', 5);

insert into product_price values(4, date '2016-01-01', 10);

commit;

The 5NF view

This first view does not compile - it fails with ORA-01799: a column may not be outer-joined to a subquery. Unfortunately, this is how most of the historized views are defined when I'm looking at the online examples of anchor modelling...

create view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )
     left join product_price pp on(
          pp.product_id = p.product_id
      and pp.from_date  = (select max(pp2.from_date) 
                             from product_price pp2 
                            where pp2.product_id = pp.product_id)
     );

Below is my attempt at fixing it. When using this view with a simple select of product_id, Oracle manages to eliminate product_color but not product_price.

create view product_5nf as
   select product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc using(product_id)
     left join (select pp1.product_id, pp1.price 
                  from product_price pp1
                 where pp1.from_date  = (select max(pp2.from_date) 
                                           from product_price pp2 
                                          where pp2.product_id = pp1.product_id)
              )pp using(product_id);

select product_id
  from product_5nf;

----------------------------------------------------------
| Id  | Operation             | Name             | Rows  |
----------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |     4 |
|*  1 |  HASH JOIN OUTER      |                  |     4 |
|   2 |   INDEX FAST FULL SCAN| PRODUCT_PK       |     4 |
|   3 |   VIEW                |                  |     3 |
|   4 |    NESTED LOOPS       |                  |     3 |
|   5 |     VIEW              | VW_SQ_1          |     5 |
|   6 |      HASH GROUP BY    |                  |     5 |
|   7 |       INDEX FULL SCAN | PRODUCT_PRICE_PK |     5 |
|*  8 |     INDEX UNIQUE SCAN | PRODUCT_PRICE_PK |     1 |
----------------------------------------------------------

The only solution I have found is to use scalar sub queries instead, like this:

create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,(select pp.price
             from product_price pp
            where pp.product_id = p.product_id
              and pp.from_date = (select max(from_date)
                                    from product_price pp2
                                   where pp2.product_id = pp.product_id)) as price
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )

select product_id
  from product_5nf;

---------------------------------------------------
| Id  | Operation            | Name       | Rows  |
---------------------------------------------------
|   0 | SELECT STATEMENT     |            |     4 |
|   1 |  INDEX FAST FULL SCAN| PRODUCT_PK |     4 |
---------------------------------------------------

Now Oracle sucessfully eliminates the product_price table. However, scalar sub queries are implemented differently than joins and they way they are executed simply doesn't allow me to get any acceptable performance in a real world scenario.

TL;DR How can I rewrite the view product_5nf so that Oracle sucessfully eliminates both of the dependent tables?


Solution

  • I think you have two problems going on here.

    First, join elimination only works for certain, specific situations (PK-PK, PK-FK, etc). It is not a general thing where you can LEFT JOIN to any row set that will return a single row for each join key value and have Oracle eliminate the join.

    Second, even if Oracle were advanced enough to do join elimination on ANY LEFT JOIN where it knew it would get only one row per join key value, Oracle does not yet support join eliminations on LEFT JOINS that are based on a composite key (Oracle support document 887553.1 says this is coming in R12.2).

    One workaround you could consider is materializing a view with the last row for each product_id. Then LEFT JOIN to the materialized view. Like this:

    create table product(
       product_id number not null
      ,constraint product_pk primary key(product_id)
    );
    
    create table product_color(
       product_id  number         not null references product
      ,color       varchar2(10)   not null
      ,constraint product_color_pk primary key(product_id)
    );
    
    create table product_price(
       product_id  number   not null references product
      ,from_date   date     not null
      ,price       number   not null
      ,constraint product_price_pk  primary key (product_id, from_date )
    );
    
    -- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for 
    -- the latest row by taking the MAX() of this column.
    alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0'))  virtual not null );
    
    -- Create a MV snapshot so we can materialize a view having only the latest
    -- row for each product_id and can refresh that MV fast on commit.
    create materialized view log on product_price with sequence, primary key, rowid ( price  ) including new values;
    
    -- Create the MV
    create materialized view product_price_latest refresh fast on commit enable query rewrite as
    SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
    FROM   product_price
    GROUP BY product_id;
    
    -- Create a primary key on the MV, so we can do join elimination
    alter table product_price_latest add constraint ppl_pk primary key ( product_id );
    
    -- Insert the OP's test data
    insert into product values(1);
    insert into product values(2);
    insert into product values(3);
    insert into product values(4);
    
    insert into product_color values(1, 'Red');
    insert into product_color values(2, 'Green');
    
    insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
    insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
    insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);
    
    insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);
    
    insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);
    
    commit;
    
    -- Create the 5NF view using the materialized view
    create or replace view product_5nf as
       select p.product_id
             ,pc.color
             ,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
             ,to_number(substr(ppl.sortable_row,25)) price 
         from product p
         left join product_color pc on pc.product_id = p.product_id
         left join product_price_latest ppl on ppl.product_id = p.product_id 
    ;
    
    -- The plan for this should not include any of the unnecessary tables.
    select product_id from product_5nf;
    
    -- Check the plan
    SELECT *
    FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                             'ALLSTATS LAST'));
    
    ------------------------------------------------
    | Id  | Operation        | Name       | E-Rows |
    ------------------------------------------------
    |   0 | SELECT STATEMENT |            |        |
    |   1 |  INDEX FULL SCAN | PRODUCT_PK |      1 |
    ------------------------------------------------