sqlxmloracle-databasequery-optimizationoracle18c

Join to XML query is only performant with redundant hardcoded WHERE clause


Oracle 18c:

I have a query that joins from a view to a subquery (related post: Join between XML queries).

with
subtype as (select * from sub_lc_events_asset_class_activity_vw),
domain  as (select      
                cast(rownum as number(38,0)) as rownum_,
                x.code,
                x.description,
                i.name as domain_name
            from        
                sde.gdb_items_vw i
            cross apply xmltable(
                '/GPCodedValueDomain2/CodedValues/CodedValue' 
                passing xmltype(i.definition)
                columns
                    code        varchar2(255) path './Code',
                    description varchar2(255) path './Name'
                ) x    
            where      
                i.name in('ACTIVITY_ATN','ACTIVITY_GCSM','ACTIVITY_MS','ACTIVITY_RD','ACTIVITY_SS_SL','ACTIVITY_WAT_RES','ACTIVITY_SWM_FACILITY','ACTIVITY_UND_SERV','ACTIVITY_BARRIER','ACTIVITY_WM_SAN')
                and i.name is not null)
select
    d.code as domain_code,
    d.description as domain_description,
    d.domain_name,
    s.subtype_code,
    s.subtype_description,
    s.subtype_field,
    s.subtype_field_domain,
    s.table_name as table_name
from
    subtype s
left join
    domain d
    on s.subtype_field_domain = d.domain_name

Explain Plan Screenshot

Normally, I'd include the explain plan as text, not as a screenshot. But this particular explain plain is a nightmare due to the XML extraction and the underlying data source (a view called GDB_ITEMS_VW) that is beyond my control. If I were to provide the explain plan as text in this post, it would be unreadable, and make the post too long. So I've included a screenshot link from SQL Developer instead.

DOMAIN_COD DOMAIN_DESCRIPTION                       DOMAIN_NAME   SUBTYPE_CODE  SUBTYPE_DESCRIPTION  SUBTYPE_FIELD SUBTYPE_FIELD_DOMAIN TABLE_NAME        
---------- ---------------------------------------- ------------- ------------- -------------------- ------------- -------------------- ------------------
RECON_CL   RECONSTRUCT CYCLING LANE                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_CRS  RECONSTRUCT CYCLING ROUTE - SIGNED ONLY  ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_CRU  RECONSTRUCT CYCLING ROUTE - URBAN PAVED  ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_FW   RECONSTRUCT FACILITY WALKWAY             ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_MUP  RECONSTRUCT MULTI-USE PATH               ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_NT   RECONSTRUCT NATURE TRAIL                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_PP   RECONSTRUCT PARK PATHWAY                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_R    RECONSTRUCT RAMP                         ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_S    RECONSTRUCT SIDEWALK                     ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_STWY RECONSTRUCT STAIRWAY                     ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
...
121 rows selected. 

The query is very fast: 0.09 seconds.


However, that query is hardcoded to certain domain names:

where
    i.name in('ACTIVITY_ATN','ACTIVITY_GCSM','ACTIVITY_MS','ACTIVITY_RD','ACTIVITY_SS_SL','ACTIVITY_WAT_RES','ACTIVITY_SWM_FACILITY','ACTIVITY_UND_SERV','ACTIVITY_BARRIER','ACTIVITY_WM_SAN')
    and i.name is not null)

I want to avoid hardcoding the domain name if I can help it.

If I remove the hardcoded domain names from the WHERE clause, then the query produces the same result, which is expected, due to the left join.

... 
where      
    --I removed the domain names from the WHERE clause.
    i.name is not null)
...

Explain Plan Screenshot

DOMAIN_COD DOMAIN_DESCRIPTION                       DOMAIN_NAME   SUBTYPE_CODE  SUBTYPE_DESCRIPTION  SUBTYPE_FIELD SUBTYPE_FIELD_DOMAIN TABLE_NAME        
---------- ---------------------------------------- ------------- ------------- -------------------- ------------- -------------------- ------------------
RECON_CL   RECONSTRUCT CYCLING LANE                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_CRS  RECONSTRUCT CYCLING ROUTE - SIGNED ONLY  ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_CRU  RECONSTRUCT CYCLING ROUTE - URBAN PAVED  ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_FW   RECONSTRUCT FACILITY WALKWAY             ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_MUP  RECONSTRUCT MULTI-USE PATH               ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_NT   RECONSTRUCT NATURE TRAIL                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_PP   RECONSTRUCT PARK PATHWAY                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_R    RECONSTRUCT RAMP                         ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_S    RECONSTRUCT SIDEWALK                     ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_STWY RECONSTRUCT STAIRWAY                     ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
...
121 rows selected. 

Unfortunately, the query is now extremely slow: 35 seconds.

I don't understand why the query is so slow now. I wouldn't have expected the domain names in the WHERE clause to have served much of a purpose, since the left join achieves the same thing; the join omits the same rows.


Question:

Why is the query only performant when the domains are hardcoded in the WHERE clause?

Apologies if I haven't included enough information. But I felt this post was long enough already. It's hard to know what amount of information would be right in the "sweet spot" (not too much, not too little). If more information is needed, then feel free to let me know.


Solution

  • There is a combination of factors:

    If you remove ROWNUM from the SELECT clause of the sub-query then:

    1. The SQL engine does not need to materialise the sub-query.
    2. Which means it has the opportunity to re-write the query to push the sub-query into the outer query and perform the LEFT JOIN before it generates the XMLTABLE.
    3. Once it re-writes the query then the ON condition can be used to provide the values to use the index, similar to using i.name in(...), and the full table scan can be eliminated again and index scans used.