Oracle REGEXP_REPLACE string delimiter handling not working

I have incoming data which is delimiter by a string token:

Data 1__##__Data Two__##__Third Data__##__4th Data__##__

I would like to split this by the delimiter __##__ to give:

Data 1
Data Two
Third Data
4th Data

I figured on looping and using this to get each data item in turn, but as a first stage to everything from the beginning to the nth item inclusive:

SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 1 ) from dual;

The above gives the desired Data 1, however:

SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 2 ) from dual;

expected to give Data1__##__Data Two instead gives the whole input string back unchanged.

Earlier I tried the following but this just returns blank:

SELECT REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
                      '(?!__##__)',
                       1, LEVEL )
FROM dual
CONNECT BY REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
                      '(?!__##__)',
                       1, LEVEL ) IS NOT NULL

This returns null.

Solution

You do not need (slow) regular expressions and can use a recursive query and simple (faster) string functions:

WITH bounds (value, spos, epos) AS (
  SELECT value,
         1,
         INSTR(value, '__##__', 1)
  FROM   table_name
UNION ALL
  SELECT value,
         epos + 6,
         INSTR(value, '__##__', epos + 6)
  FROM   bounds
  WHERE  epos > 0
) SEARCH DEPTH FIRST BY value SET order_id
SELECT SUBSTR(value, spos, epos - spos) AS value 
FROM   bounds
WHERE  epos > 0

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__' FROM DUAL;

Outputs:

VALUE
Data 1
Data Two
Third Data
4th Data

Your query does not work because Oracle does not support zero-width look-arounds (neither ahead nor behind, positive nor negative) in regular expressions. Although, even if Oracle did support it I'm not sure that it would give the answer you expect.

If you really want to use regular expressions, you can fix your query to:

SELECT REGEXP_SUBSTR(
         'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
         '(.*?)__##__',
         1,
         LEVEL,
         NULL,
         1
       ) AS value
FROM   dual
CONNECT BY REGEXP_COUNT(
         'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
         '(.*?)__##__'
       ) >= LEVEL;

Which outputs the same as the previous query (although slower) for a single row of input (but would start to generate exponentially increasing numbers of duplicate rows if you provide multiple rows of input).

fiddle