sqloraclesql-optimization

How to select count from XML clob column?


I am trying to select the count of tags in the XML(CLOB) column.

I tried with

select regexp_count(diagram, 'userTask id=', 1, 'c') as "User Tasks",
       regexp_count(diagram, 'task id=', 1, 'c') as "Task"
from process_table

It works but, I need to get output faster than using regexp_count.

I tried with:

select count(xt.task),
count(xt.userTask)
from process_table process
cross join xmltable(
            xmlnamespaces(default 'http://www.omg.org/spec/BPMN/20100524/MODEL'),
            '//definitions/process' passing xmltype(process.diagram) 
            columns
            task varchar2(20) path 'task',
            userTask varchar2(60) path 'userTask'
            ) xt

But, I am getting an error ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence


Solution

  • There are a few ways to do this. One is to use XMLTable to look for either child node, get the node names, and count how many times each appears:

    select
      count(case when xt.name = 'userTask' then name end) as userTasks,
      count(case when xt.name = 'task' then name end) as tasks
    from process_table process
    cross join xmltable(
      xmlnamespaces(default 'http://www.omg.org/spec/BPMN/20100524/MODEL'),
      '//definitions/process/(userTask|task)' passing xmltype(process.diagram) 
      columns
        name varchar2(20) path 'name(.)'
    ) xt
    

    Or you could use FLWOR expressions to get the count for both child node types at once:

    select userTasks, tasks
    from process_table process
    cross join xmltable(
      xmlnamespaces(default 'http://www.omg.org/spec/BPMN/20100524/MODEL'),
      'let $u := count(//definitions/process/userTask)
       let $s := count(//definitions/process/task)
       return <x><u>{$u}</u><s>{$s}</s></x>'
      passing xmltype(process.diagram) 
      columns
        userTasks number path 'u',
        tasks number path 's'
    ) xt
    

    but I'm not sure that would end up any faster than the first option.

    Another option is to use separate XMLQuery XPath counts for each node to check:

    select
      xmlquery('declare default element namespace "http://www.omg.org/spec/BPMN/20100524/MODEL";
          count(//definitions/process/userTask)'
        passing xmltype(diagram)
        returning content) as userTasks,
      xmlquery('declare default element namespace "http://www.omg.org/spec/BPMN/20100524/MODEL";
          count(//definitions/process/task)'
        passing xmltype(diagram)
        returning content) as tasks
    from process_table
    

    db<>fiddle with a simple made-up XML CLOB that throws the error you see, and those three approaches (including converting the XMLQuery results to numbers).

    I'm interested to see what other people come up with too.


    XPath counts returns ORA-06502: PL/SQL: numeric or value error when there are more than 10 rows. When I add where clause works perfectly

    When I add id = some_number, or when 10 rows are displayed it works.

    This will error if a processed row has a null diagram; but you can exclude those by just adding where diagram is not null.

    db<>fiddle