I have a table (using Oracle 19c) with three columns, two of them representing a parent-child relationship, and another one representing a textual description. I'm using connect by
without a start with
, but followed by a group by
, and Oracle decides to not apply the group by
if the description column is a not null
column with an unique
restriction, causing the rows that have been repeated, because of the absence of the start with
clause, to be included in the final output.
create table test_hie (id int, parent int, name varchar2(64) not null);
insert into test_hie (id, parent, name) values (0, null, 'ABC');
insert into test_hie (id, parent, name) values (1, 0, 'DEF');
create unique index test_hie_idx_name on test_hie (name);
alter session set statistics_level = all;
select id from test_hie connect by prior id = parent group by id;
select * from table(dbms_xplan.display_cursor('6pfqf6fg5crck', 0, 'ALLSTATS LAST PROJECTION'));
select id from test_hie connect by prior id = parent group by id, name;
select * from table(dbms_xplan.display_cursor('g56y8n3pubzud', 0, 'ALLSTATS LAST PROJECTION'));
Without the group by
or the start with
, row 1
would be included twice: as root of its own tree and as a child of 0
.
To remove duplicates, I apply a group by
. In the first query, with str
not included in the group by
list, row 1
only happens once, but if I include str
in the group by
, then it happens twice.
Output of query 1 (str not in the group by)
id
---
1
0
Output of query 2 (str in the group by)
id
---
1
0
1
That's behaviour is incorrect. If I'm grouping by id
, I shouldn't see any id
that is repeated.
Adding str
to the column list doesn't change anything, nor using any aggregate function or adding more columns to the table. It just suffices str
to be a non nullable column with an unique index to observe the incorrect behaviour.
Execution plan of the first query; notice that the plan contains a HASH GROUP BY
with a CONNECT BY WITHOUT FILTERING (UNIQUE)
:
SQL_ID 6pfqf6fg5crck, child number 0
-------------------------------------
select id from test_hie connect by prior id = parent group by id
Plan hash value: 2961238377
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01 | 7 | | | |
| 1 | HASH GROUP BY | | 1 | 2 | 2 |00:00:00.01 | 7 | 1968K| 1968K| 647K (0)|
|* 2 | CONNECT BY WITHOUT FILTERING (UNIQUE)| | 1 | | 3 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)|
| 3 | TABLE ACCESS FULL | TEST_HIE | 1 | 2 | 2 |00:00:00.01 | 7 | | | |
----------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("PARENT"=PRIOR NULL)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (rowset=256) "ID"[NUMBER,22]
2 - "ID"[NUMBER,22], "PARENT"[NUMBER,22], PRIOR NULL[22], LEVEL[4]
3 - "ID"[NUMBER,22], "PARENT"[NUMBER,22]
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Plan of the second query; notice that the group by
operation has dissappeared, and also that the CONNECT BY WITHOUT FILTERING
is not longer unique either:
SQL_ID g56y8n3pubzud, child number 0
-------------------------------------
select id from test_hie connect by prior id = parent group by id, name
Plan hash value: 4109999158
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 | 7 | | | |
|* 1 | CONNECT BY WITHOUT FILTERING| | 1 | | 3 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)|
| 2 | TABLE ACCESS FULL | TEST_HIE | 1 | 2 | 2 |00:00:00.01 | 7 | | | |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("PARENT"=PRIOR NULL)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "ID"[NUMBER,22], "PARENT"[NUMBER,22], PRIOR NULL[22], LEVEL[4]
2 - "ID"[NUMBER,22], "PARENT"[NUMBER,22]
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Is that an Oracle 19c bug? It looks like that it somehow "thinks" that repetitions are not possible (because of the non-null unique column) and so grouping is unnecessary, forgetting that the connect by
is artificially injecting repeated rows in the row set.
By the way, what does unique
really means in CONNECT BY WITHOUT FILTERING (UNIQUE)
?
Yeah that looks like a bug to me (which I'll log). I suspect we're reading "too much" into the unique index. We see the GROUP BY 'unique col' and go "Cool, we don't need to group by", which can be seen from a 10053 trace
QB before group-by removal:******* UNPARSED QUERY IS *******
SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT" GROUP BY "TEST_HIE"."ID","TEST_HIE"."NAME"
QB before group-by elimination:******* UNPARSED QUERY IS *******
SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT" GROUP BY "TEST_HIE"."ID","TEST_HIE"."NAME"
Registered qb: SEL$47952E7A 0x78ddb940 (ELIMINATION OF GROUP BY SEL$1; SEL$1)
---------------------
QUERY BLOCK SIGNATURE
---------------------
signature (): qb_name=SEL$47952E7A nbfros=1 flg=0
fro(0): flg=0 objn=189661 hint_alias="TEST_HIE"@"SEL$1"
QB after group-by elimination:******* UNPARSED QUERY IS *******
SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT"
Registered qb: SEL$9BB7A81A 0x78ddb940 (ELIMINATION OF GROUP BY SEL$47952E7A; SEL$47952E7A)
As an interim measure, you could add a redundant START WITH
SQL> create table test_hie (id int, parent int, name varchar2(64) not null);
Table created.
SQL> insert into test_hie (id, parent, name) values (0, null, 'ABC');
1 row created.
SQL> insert into test_hie (id, parent, name) values (1, 0, 'DEF');
1 row created.
SQL> create unique index test_hie_idx_name on test_hie (name);
Index created.
SQL>
SQL> select id from test_hie connect by prior id = parent group by id;
ID
----------
1
0
SQL> select id from test_hie connect by prior id = parent group by id, name;
ID
----------
1
0
1
SQL> select id from test_hie start with 1=1 connect by prior id = parent group by id, name;
ID
----------
1
0