oraclehierarchy

Oracle: connect by without start with, but group by + a non-null unique column: group by ignored


I have a table (using Oracle 19c) with three columns, two of them representing a parent-child relationship, and another one representing a textual description. I'm using connect by without a start with, but followed by a group by, and Oracle decides to not apply the group by if the description column is a not null column with an unique restriction, causing the rows that have been repeated, because of the absence of the start with clause, to be included in the final output.

create table test_hie (id int, parent int, name varchar2(64) not null);
insert into test_hie (id, parent, name) values (0, null, 'ABC');
insert into test_hie (id, parent, name) values (1, 0, 'DEF');
create unique index test_hie_idx_name on test_hie (name);

alter session set statistics_level = all;

select id from test_hie connect by prior id = parent group by id;
select * from table(dbms_xplan.display_cursor('6pfqf6fg5crck', 0, 'ALLSTATS LAST PROJECTION'));

select id from test_hie connect by prior id = parent group by id, name;
select * from table(dbms_xplan.display_cursor('g56y8n3pubzud', 0, 'ALLSTATS LAST PROJECTION'));

Without the group by or the start with, row 1 would be included twice: as root of its own tree and as a child of 0.

To remove duplicates, I apply a group by. In the first query, with str not included in the group by list, row 1 only happens once, but if I include str in the group by, then it happens twice.

Output of query 1 (str not in the group by)
id 
---
1
0

Output of query 2 (str in the group by)

id
---
1
0
1

That's behaviour is incorrect. If I'm grouping by id, I shouldn't see any id that is repeated.

Adding str to the column list doesn't change anything, nor using any aggregate function or adding more columns to the table. It just suffices str to be a non nullable column with an unique index to observe the incorrect behaviour.

Execution plan of the first query; notice that the plan contains a HASH GROUP BY with a CONNECT BY WITHOUT FILTERING (UNIQUE):

SQL_ID  6pfqf6fg5crck, child number 0
-------------------------------------
select id from test_hie connect by prior id = parent group by id
 
Plan hash value: 2961238377
 
----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |          |      1 |        |      2 |00:00:00.01 |       7 |       |       |          |
|   1 |  HASH GROUP BY                         |          |      1 |      2 |      2 |00:00:00.01 |       7 |  1968K|  1968K|  647K (0)|
|*  2 |   CONNECT BY WITHOUT FILTERING (UNIQUE)|          |      1 |        |      3 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS FULL                   | TEST_HIE |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("PARENT"=PRIOR NULL)
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
   1 - (rowset=256) "ID"[NUMBER,22]
   2 - "ID"[NUMBER,22], "PARENT"[NUMBER,22], PRIOR NULL[22], LEVEL[4]
   3 - "ID"[NUMBER,22], "PARENT"[NUMBER,22]
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Plan of the second query; notice that the group by operation has dissappeared, and also that the CONNECT BY WITHOUT FILTERING is not longer unique either:

SQL_ID  g56y8n3pubzud, child number 0
-------------------------------------
select id from test_hie connect by prior id = parent group by id, name
 
Plan hash value: 4109999158
 
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |        |      3 |00:00:00.01 |       7 |       |       |          |
|*  1 |  CONNECT BY WITHOUT FILTERING|          |      1 |        |      3 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   2 |   TABLE ACCESS FULL          | TEST_HIE |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("PARENT"=PRIOR NULL)
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
   1 - "ID"[NUMBER,22], "PARENT"[NUMBER,22], PRIOR NULL[22], LEVEL[4]
   2 - "ID"[NUMBER,22], "PARENT"[NUMBER,22]
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Is that an Oracle 19c bug? It looks like that it somehow "thinks" that repetitions are not possible (because of the non-null unique column) and so grouping is unnecessary, forgetting that the connect by is artificially injecting repeated rows in the row set.

By the way, what does unique really means in CONNECT BY WITHOUT FILTERING (UNIQUE) ?


Solution

  • Yeah that looks like a bug to me (which I'll log). I suspect we're reading "too much" into the unique index. We see the GROUP BY 'unique col' and go "Cool, we don't need to group by", which can be seen from a 10053 trace

    QB before group-by removal:******* UNPARSED QUERY IS *******
    SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT" GROUP BY "TEST_HIE"."ID","TEST_HIE"."NAME"
    QB before group-by elimination:******* UNPARSED QUERY IS *******
    SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT" GROUP BY "TEST_HIE"."ID","TEST_HIE"."NAME"
    Registered qb: SEL$47952E7A 0x78ddb940 (ELIMINATION OF GROUP BY SEL$1; SEL$1)
    ---------------------
    QUERY BLOCK SIGNATURE
    ---------------------
      signature (): qb_name=SEL$47952E7A nbfros=1 flg=0
        fro(0): flg=0 objn=189661 hint_alias="TEST_HIE"@"SEL$1"
    
    QB after group-by elimination:******* UNPARSED QUERY IS *******
    SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT"
    Registered qb: SEL$9BB7A81A 0x78ddb940 (ELIMINATION OF GROUP BY SEL$47952E7A; SEL$47952E7A)
    

    As an interim measure, you could add a redundant START WITH

    SQL> create table test_hie (id int, parent int, name varchar2(64) not null);
    
    Table created.
    
    SQL> insert into test_hie (id, parent, name) values (0, null, 'ABC');
    
    1 row created.
    
    SQL> insert into test_hie (id, parent, name) values (1, 0, 'DEF');
    
    1 row created.
    
    SQL> create unique index test_hie_idx_name on test_hie (name);
    
    Index created.
    
    SQL>
    SQL> select id from test_hie connect by prior id = parent group by id;
    
            ID
    ----------
             1
             0
    
    SQL> select id from test_hie connect by prior id = parent group by id, name;
    
            ID
    ----------
             1
             0
             1
    
    SQL> select id from test_hie start with 1=1 connect by prior id = parent group by id, name;
    
            ID
    ----------
             1
             0