So either this is an Oracle bug, or I'm being a bit slow today.
This SQL executes fine:
WITH car_paint_options AS (
SELECT 'Escort' car_model, 'red,blue' paint_opts FROM dual UNION
SELECT 'Puma' car_model, 'black' paint_opts FROM dual
)
SELECT row_number() over(order by level) rn, level, ep.car_model,
regexp_substr(ep.paint_opts, '[^,]+', 1, level) paint_opt
FROM car_paint_options ep
CONNECT BY regexp_substr (ep.paint_opts, '[^,]+', 1, level) is not null
However, it gives me the wrong answer (4 lines of data):
rn level car_model paint_opt
--- ----- --------- ---------
1 1 Puma black
2 1 Focus red
3 2 Focus blue
4 2 Focus blue
The desired output is only 3 lines of data as shown here:
rn level car_model paint_opt
--- ----- --------- ---------
1 1 Puma black
2 1 Focus red
3 2 Focus blue
I understand why the problem occurs. Level 2 records are attempting to connect back to level 1 records, and what's happening is that the Focus:blue
option is matching back successfully to both Puma:black
and Focus:red
parent rows.
So now I'm thinking: "fine, simple enough fix, let's constrain the level 2
objects so that they only ever link back to parent objects of the same car_model
":
WITH car_paint_options AS (
SELECT 'Escort' car_model, 'red,blue' paint_opts FROM dual UNION
SELECT 'Puma' car_model, 'black' paint_opts FROM dual
)
SELECT row_number() over(order by level) rn, level, ep.car_model,
regexp_substr(ep.paint_opts, '[^,]+', 1, level) paint_opt
FROM car_paint_options ep
CONNECT BY regexp_substr (ep.paint_opts, '[^,]+', 1, level) is not null
AND ep.car_model = prior ep.car_model
But it results in an error:
ORA-01436: CONNECT BY loop in user data
Tested in both Oracle 12c + 19c. Can someone confirm I'm not doing anything silly here? Am I right in thinking it's a bug?
The normal "hack" for this is to add something to the CONNECT BY
clause that gives each row a unique value within a filter that always evaluates to true such as SYS_GUID() IS NOT NULL
or DBMS_RANDOM.VALUE() IS NOT NULL
and prevents the hierarchical query from detecting cycles:
WITH car_paint_options (car_model, paint_opts) AS (
SELECT 'Escort', 'red,blue' FROM dual UNION ALL
SELECT 'Puma', 'black' FROM dual
)
SELECT row_number() over(order by level) rn, level, ep.car_model,
regexp_substr(ep.paint_opts, '[^,]+', 1, level) paint_opt
FROM car_paint_options ep
CONNECT BY regexp_substr (ep.paint_opts, '[^,]+', 1, level) is not null
AND ep.car_model = prior ep.car_model
AND PRIOR SYS_GUID() IS NOT NULL
Which outputs:
RN | LEVEL | CAR_MODEL | PAINT_OPT |
---|---|---|---|
1 | 1 | Escort | red |
2 | 1 | Puma | black |
3 | 2 | Escort | blue |
Regular expression functions are slow and it is often faster to use simple string functions, even if it means you need to type more (and if simple string functions are faster than regular expressions then they are even faster than regular expressions plus generating a GUID for each row):
WITH car_paint_options (car_model, paint_opts) AS (
SELECT 'Escort', 'red,blue' FROM dual UNION ALL
SELECT 'Puma', 'black' FROM dual
),
bounds (car_model, paint_opts, lvl, spos, epos) AS (
SELECT car_model,
paint_opts,
1,
1,
INSTR(paint_opts, ',', 1)
FROM car_paint_options
UNION ALL
SELECT car_model,
paint_opts,
lvl + 1,
epos + 1,
INSTR(paint_opts, ',', epos + 1)
FROM bounds
WHERE epos > 0
)
SELECT ROW_NUMBER() OVER (ORDER BY lvl) AS rn,
lvl,
car_model,
CASE epos
WHEN 0
THEN SUBSTR(paint_opts, spos)
ELSE SUBSTR(paint_opts, spos, epos - spos)
END AS paint_opt
FROM bounds
Which outputs the same.
RN | LVL | CAR_MODEL | PAINT_OPT |
---|---|---|---|
1 | 1 | Escort | red |
2 | 1 | Puma | black |
3 | 2 | Escort | blue |
If you did want to still use regular expressions and want to avoid SYS_GUID
hacks then, from Oracle 12, you can generate the level values using a LATERAL
join (or CROSS APPLY
):
WITH car_paint_options (car_model, paint_opts) AS (
SELECT 'Escort', 'red,blue' FROM dual UNION ALL
SELECT 'Puma', 'black' FROM dual
)
SELECT row_number() over(order by depth) rn,
depth,
ep.car_model,
regexp_substr(ep.paint_opts, '[^,]+', 1, depth) AS paint_opt
FROM car_paint_options ep
CROSS JOIN LATERAL (
SELECT LEVEL AS depth
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(ep.paint_opts, '[^,]+')
)
Which also outputs the same.
A comparison of the performance of different options for splitting delimited strings is given in this answer. The summary is that option 2 tends to be the most performant (and the most to type).