oracle-databaseregexp-substr

Oracle REGEXP_REPLACE string delimiter handling not working


I have incoming data which is delimiter by a string token:

Data 1__##__Data Two__##__Third Data__##__4th Data__##__

I would like to split this by the delimiter __##__ to give:

Data 1
Data Two
Third Data
4th Data

I figured on looping and using this to get each data item in turn, but as a first stage to everything from the beginning to the nth item inclusive:

SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 1 ) from dual;

The above gives the desired Data 1, however:

SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 2 ) from dual;

expected to give Data1__##__Data Two instead gives the whole input string back unchanged.

Earlier I tried the following but this just returns blank:

SELECT REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
                      '(?!__##__)',
                       1, LEVEL )
FROM dual
CONNECT BY REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
                      '(?!__##__)',
                       1, LEVEL ) IS NOT NULL

This returns null.


Solution

  • You do not need (slow) regular expressions and can use a recursive query and simple (faster) string functions:

    WITH bounds (value, spos, epos) AS (
      SELECT value,
             1,
             INSTR(value, '__##__', 1)
      FROM   table_name
    UNION ALL
      SELECT value,
             epos + 6,
             INSTR(value, '__##__', epos + 6)
      FROM   bounds
      WHERE  epos > 0
    ) SEARCH DEPTH FIRST BY value SET order_id
    SELECT SUBSTR(value, spos, epos - spos) AS value 
    FROM   bounds
    WHERE  epos > 0
    

    Which, for the sample data:

    CREATE TABLE table_name (value) AS
    SELECT 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__' FROM DUAL;
    

    Outputs:

    VALUE
    Data 1
    Data Two
    Third Data
    4th Data

    Your query does not work because Oracle does not support zero-width look-arounds (neither ahead nor behind, positive nor negative) in regular expressions. Although, even if Oracle did support it I'm not sure that it would give the answer you expect.

    If you really want to use regular expressions, you can fix your query to:

    SELECT REGEXP_SUBSTR(
             'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
             '(.*?)__##__',
             1,
             LEVEL,
             NULL,
             1
           ) AS value
    FROM   dual
    CONNECT BY REGEXP_COUNT(
             'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
             '(.*?)__##__'
           ) >= LEVEL;
    

    Which outputs the same as the previous query (although slower) for a single row of input (but would start to generate exponentially increasing numbers of duplicate rows if you provide multiple rows of input).

    fiddle