I have incoming data which is delimiter by a string token:
Data 1__##__Data Two__##__Third Data__##__4th Data__##__
I would like to split this by the delimiter __##__
to give:
Data 1
Data Two
Third Data
4th Data
I figured on looping and using this to get each data item in turn, but as a first stage to everything from the beginning to the nth
item inclusive:
SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 1 ) from dual;
The above gives the desired Data 1
, however:
SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 2 ) from dual;
expected to give Data1__##__Data Two
instead gives the whole input string back unchanged.
Earlier I tried the following but this just returns blank:
SELECT REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__',
'(?!__##__)',
1, LEVEL )
FROM dual
CONNECT BY REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__',
'(?!__##__)',
1, LEVEL ) IS NOT NULL
This returns null.
You do not need (slow) regular expressions and can use a recursive query and simple (faster) string functions:
WITH bounds (value, spos, epos) AS (
SELECT value,
1,
INSTR(value, '__##__', 1)
FROM table_name
UNION ALL
SELECT value,
epos + 6,
INSTR(value, '__##__', epos + 6)
FROM bounds
WHERE epos > 0
) SEARCH DEPTH FIRST BY value SET order_id
SELECT SUBSTR(value, spos, epos - spos) AS value
FROM bounds
WHERE epos > 0
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__' FROM DUAL;
Outputs:
VALUE |
---|
Data 1 |
Data Two |
Third Data |
4th Data |
Your query does not work because Oracle does not support zero-width look-arounds (neither ahead nor behind, positive nor negative) in regular expressions. Although, even if Oracle did support it I'm not sure that it would give the answer you expect.
If you really want to use regular expressions, you can fix your query to:
SELECT REGEXP_SUBSTR(
'Data 1__##__Data Two__##__Third Data__##__4th Data__##__',
'(.*?)__##__',
1,
LEVEL,
NULL,
1
) AS value
FROM dual
CONNECT BY REGEXP_COUNT(
'Data 1__##__Data Two__##__Third Data__##__4th Data__##__',
'(.*?)__##__'
) >= LEVEL;
Which outputs the same as the previous query (although slower) for a single row of input (but would start to generate exponentially increasing numbers of duplicate rows if you provide multiple rows of input).