oracle-databaseperformanceplsqlquery-optimizationxmltable

Performance tuning of a PLSQL query on XML table


I need to perform some queries on a table on an Oracle database. This table contains XML values. Here below an example of the xml structure I have:

<Parent>
  <ParentId>382010</ParentId>
  <LastUpd>2023-03-01T22:59:10.456241</LastUpd>
  <UserId>0</UserId>
  <attrn>xxx</attrn>
  <Child>
     <ChildId>1</ChildId>
    <Attribute1 ID="1873" D="1466 Description">1466</Attribute1>
    <Attribute2 ID="1234" D="QWERTY Description" E="503" ED="Error 503 Description">QWERTY</Attribute2>
    <Attribute3 ID="4921" D="Other Description">YourValue</Attribute3>
  </Child>
  <Child>
    <ChildId>2</ChildId>
    <Attribute1 ID="1296" D="Some Description">1234</Attribute1>
    <Attribute2 ID="1234" D="Some Different Description">ABC</Attribute2>
    <Attribute3 ID="4921" D="Other Description"  E="501" ED="Error 501 Description">MyValye</Attribute3>
  </Child>
</Parent>

I need to select all the Cildren Attributes that have error, it can be identified by the "E" attribute (e.g. Attribute 2 in child 1 and Attribute 3 in child 2).

I'm not very expert on this kind of XML queries, after some tries I ended building the query below:

SELECT EXTRACTVALUE (VALUE (X), '/Parent/UserId')  AS USER_ID
      ,EXTRACTVALUE (VALUE (X), '/Parent/ParentId')  AS PARENT_ID
      ,EXTRACTVALUE (VALUE (X), '/Parent/attrn')  AS PARENT_ATTR_N_COL_NAME
      ,EXTRACTVALUE (VALUE (I), '/Child/ChildId') AS ROW_NUM
      ,CASE
         WHEN EXISTSNODE (VALUE (E), '/Attribute1/@E') = 1 THEN ATTR_ONE_COL_NAME
         WHEN EXISTSNODE (VALUE (E), '/Attribute2/@E') = 1 THEN ATTR_TWO_COL_NAME
         WHEN EXISTSNODE (VALUE (E), '/Attribute3/@E') = 1 THEN ATTR_THREE_COL_NAME
        END AS FIELD
      ,EXTRACTVALUE (VALUE(E), '/*/text()') as VALUE
      ,EXTRACTVALUE (VALUE(E), '/*/@E') as ERROR_CODE
      ,EXTRACTVALUE (VALUE(E), '/*/@ED') as ERROR_DESC
  FROM XML_TABLE X
      ,TABLE (XMLSEQUENCE (EXTRACT (VALUE (X), '/Parent/Child')))  I
      ,TABLE (XMLSEQUENCE (EXTRACT (VALUE (I), '/Child/*'))) E
 WHERE     EXTRACTVALUE (VALUE (X), '/Parent/ParentId') = 382010
       AND EXISTSNODE (VALUE (E), '/*/@E') = 1';                                                                                      

Of course the XML (and then the query) is actually much bigger, I posted just a simplified example to be more synthetic. Sometimes the parent node contains hundreds of children and sometimes I need to query thousands of parents, so I'm wondering if there is a way to optimize the above query and make it faster.

Many thanks :)



Solution

  • Consider replacing the deprecated EXTRACTVALUE functions with the newer and more powerful XMLTABLE. With XMLTABLE, all the parsing can be done at one time and in a single query. Then you can use that query to either perform DML operations or you can loop through it for more procedural control.

    The below code doesn't precisely match your results, but it's close enough to give you an idea of how it works.

    select
        parent.user_id, parent.parent_id, parent.parent_attr_n_col_name,
        child.row_num, attribute.value, attribute.error_code, attribute.error_desc
    from xmltest x,
    xmltable
    ('/Parent' passing x.a columns
        user_id number path 'UserId',
        parent_id number path 'ParentId',
        parent_attr_n_col_name path 'attrn'
    ) parent,
    xmltable
    ('/Parent/Child' passing x.a columns
        row_num number path 'ChildId'
    ) child,
    xmltable
    ('/Parent/Child/*' passing x.a columns
        value varchar2(100) path 'text()',
        error_code varchar2(100) path '@E',
        error_desc varchar2(100) path '@ED'
    ) attribute
    where parent_id = 382010
        and error_code is not null;
    

    Here's the schema I used for my test code:

    --drop table xmltest;
    
    create table xmltest(a xmltype);
    
    insert into xmltest values(xmltype('
    <Parent>
      <ParentId>382010</ParentId>
      <LastUpd>2023-03-01T22:59:10.456241</LastUpd>
      <UserId>0</UserId>
      <attrn>xxx</attrn>
      <Child>
         <ChildId>1</ChildId>
        <Attribute1 ID="1873" D="1466 Description">1466</Attribute1>
        <Attribute2 ID="1234" D="QWERTY Description" E="503" ED="Error 503 Description">QWERTY</Attribute2>
        <Attribute3 ID="4921" D="Other Description">YourValue</Attribute3>
      </Child>
      <Child>
        <ChildId>2</ChildId>
        <Attribute1 ID="1296" D="Some Description">1234</Attribute1>
        <Attribute2 ID="1234" D="Some Different Description">ABC</Attribute2>
        <Attribute3 ID="4921" D="Other Description"  E="501" ED="Error 501 Description">MyValye</Attribute3>
      </Child>
    </Parent>'));
    

    Here's your query, with some changes that work for me based on the previous schema. I'm not sure how your table is setup - with a REF instead of a value?

    SELECT
          EXTRACTVALUE (X.a, '/Parent/UserId')  AS USER_ID
          ,EXTRACTVALUE (X.a, '/Parent/ParentId')  AS PARENT_ID
          ,EXTRACTVALUE (X.a, '/Parent/attrn')  AS PARENT_ATTR_N_COL_NAME
          ,EXTRACTVALUE (I.column_value, '/Child/ChildId') AS ROW_NUM
    --      ,CASE
    --         WHEN EXISTSNODE (E.column_value, '/Attribute1/@E') = 1 THEN ATTR_ONE_COL_NAME
    --         WHEN EXISTSNODE (E.column_value, '/Attribute2/@E') = 1 THEN ATTR_TWO_COL_NAME
    --         WHEN EXISTSNODE (E.column_value, '/Attribute3/@E') = 1 THEN ATTR_THREE_COL_NAME
    --        END AS FIELD
          ,EXTRACTVALUE (E.column_value, '/*/text()') as VALUE
          ,EXTRACTVALUE (E.column_value, '/*/@E') as ERROR_CODE
          ,EXTRACTVALUE (E.column_value, '/*/@ED') as ERROR_DESC
      FROM xmltest X
          ,TABLE (XMLSEQUENCE (EXTRACT (X.a, '/Parent/Child')))  I
          ,TABLE (XMLSEQUENCE (EXTRACT (value(I), '/Child/*'))) E
     WHERE     EXTRACTVALUE (X.a, '/Parent/ParentId') = 382010
           AND EXISTSNODE (VALUE (E), '/*/@E') = '1';