regexoracleloopsclob

Oracle query to list all occurrences of a string and its subsequent values from CLOB field


I have a CLOB field in my Oracle table and would like to extract all occurrences of the string that matches the pattern 'RES_GetResData_Public_ScreenPrint' and its subsequent values (which are separated by '|' ).

Sample Data-

|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|


For the above-provided example, I used the below query to get all occurrences of the string 'RES_GetResData_Public_ScreenPrint' and its subsequent values (separated by | ) from the CLOB data.

The query is able to get all the occurrences of 'RES_GetResData_Public_ScreenPrint', but returns only the first subsequent value in all the rows.

**Query - **

select objectid, 
    REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^\|]+', 1,column_value) column1, 
    REGEXP_SUBSTR(DATA, '([^\|]+)', 1, 3) AS column2
from test_table cross join table(cast(multiset(select level 
from dual connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') ) as sys.odcinumberlist))

Actual Result -

objectid column1 column2
12345 RES_GetResData_Public_ScreenPrint010 F28028079
12345 RES_GetResData_Public_ScreenPrint011 F28028079
12345 RES_GetResData_Public_ScreenPrint013 F28028079

Expected Result -

objectid column1 column2
12345 RES_GetResData_Public_ScreenPrint010 F28028079
12345 RES_GetResData_Public_ScreenPrint011 F28028081
12345 RES_GetResData_Public_ScreenPrint013 F28028007

I'm new to writing queries and any suggestion would be helpful.

Thank you !!


Solution

  • You do not need regular expression for this; although it is more to type you will probably find that simple string functions are faster than regular expressions:

    WITH line_bounds (objectid, data, spos, epos) AS (
      SELECT objectid,
             data,
             1,
             INSTR(data, CHR(10), 1)
      FROM   test_table
    UNION ALL
      SELECT objectid,
             data,
             epos + 1,
             INSTR(data, CHR(10), epos + 1)
      FROM   line_bounds
      WHERE  epos > 0
    )
    SEARCH DEPTH FIRST BY objectid SET orderid,
    lines (objectid, line) AS (
      SELECT objectid,
             CASE epos
             WHEN 0
             THEN SUBSTR(data, spos)
             ELSE SUBSTR(data, spos, epos -spos)
             END
      FROM   line_bounds
    ),
    match_bounds (objectid, line, res_spos, res_epos, next_epos) AS (
      SELECT objectid,
             line,
             INSTR(line, '|RES_GetResData_Public_ScreenPrint'),
             INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 1),
             INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 2)
      FROM   lines
    )
    SELECT objectid,
           SUBSTR(line, res_spos + 1, res_epos - res_spos - 1) AS column1,
           SUBSTR(line, res_epos + 1, next_epos - res_epos - 1) AS column2
    FROM   match_bounds
    WHERE  res_spos > 0;
    

    Which, for the sample data:

    CREATE TABLE test_table (objectid, data) AS
    SELECT 12345,
           EMPTY_CLOB() || '|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|
    |00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|
    |-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|'
    FROM   DUAL;
    

    Outputs:

    OBJECTID COLUMN1 COLUMN2
    12345 RES_GetResData_Public_ScreenPrint010 F28028079
    12345 RES_GetResData_Public_ScreenPrint011 F28028081
    12345 RES_GetResData_Public_ScreenPrint013 F28028007

    If you did want to use regular expressions (please compare the two solutions performance on your data) then you can match the entire pattern and extract the value of the capturing groups:

    select objectid, 
           REGEXP_SUBSTR(
             DATA,
             '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
             1,
             column_value,
             NULL,
             1
           ) AS column1, 
           REGEXP_SUBSTR(
             DATA,
             '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
             1,
             column_value,
             NULL,
             2
           ) AS column2
    from   test_table
           cross join table(
             cast(
               multiset(
                 select level 
                 from   dual
                 connect by level <= regexp_count(DATA, '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|')
               ) as sys.odcinumberlist
             )
           )
    

    Which has the same output.

    fiddle