oracle-databaseplsqlregexp-replacevarchar2regexp-like

Remove Duplicate nested phrases from a string with Oracle Regexp_replace


as part of a PL/SQL script, we are setting a Varchar variable called 'V_COMMENT' as such:

V_COMMENT := INCOMING_COMMENT || '[' || V_COMMENT || ']';

because this statement is run potentially multiple times for each record, we will eventually end up with a comment like:

process 578 [process 456[process 123]]

There is an issue, when re running this script, sometimes a duplicate tag can be added as such:

process 123 [process 123 [process 000]]

or

process 456 [process 123 [process 123]]

Where a comment is repeated, and nested.

Is there a way to use Regexp_Replace() to remove duplicate tags, nested or otherwise?

OR ALTERNATIVELY

Is there a way to catch occurrences of these errant comments with regexp_like


Solution

  • You can use solution below to remove duplicates comments. in fact, I use dynamic sql within pl/sql to solve the problem.

    DECLARE
    v_comment CLOB ;
    v_comment2 CLOB ; 
    v_sql CLOB;
    
    BEGIN
    --V_COMMENT := INCOMING_COMMENT || '[' || V_COMMENT || ']'; 
    v_comment := 'process 456[process 123[process 123[process 456[process 000]]]]' ;
    
    v_sql := q'{
    with v_temp1 as (
      select '}'||v_comment||q'{' as comm0 from dual
      )
     , v_temp2 AS (
      select comm0
           , LEVEL lvl
           , regexp_count(comm0, '[^\[]+') cnt
           , trim(rtrim( regexp_substr(comm0, '[^\[]+', 1, LEVEL), ']' )) AS comm1
           , row_number()OVER(PARTITION BY comm0, trim(rtrim( regexp_substr(comm0, '[^\[]+', 1, LEVEL), ']' ))  ORDER BY LEVEL) rnb
      from v_temp1
      CONNECT BY LEVEL <= regexp_count(comm0, '[^\[]+')
     )
      SELECT listagg(comm1, '[') WITHIN GROUP (ORDER BY lvl) || 
             LPAD(']', regexp_count(listagg(comm1, '[')WITHIN GROUP (ORDER BY lvl), '\['), ']') comm2
      FROM v_temp2
      WHERE rnb = 1
    }'
    ;
    
    --dbms_output.put_line(v_sql); --test
    
    execute immediate v_sql into v_comment2
    ;
    
    dbms_output.put_line('input v_comment : ' ||v_comment ); --test
    dbms_output.put_line('output v_comment2 : '||v_comment2); --test
    END;
    /
    

    demo : pl/sql

    Here is the sql part of my solution :

    with v_temp1 as (
      select 'process 578 [process 456 [process 123]]' comm0 from dual union all
      select 'process 123 [process 123 [process 000]]' from dual union ALL
      select 'process 456 [process 123 [process 123 [process 456 ]]]' from dual
      )
     , v_temp2 AS (
      select comm0
           , LEVEL lvl
           , regexp_count(comm0, '[^\[]+') cnt
           , trim(rtrim( regexp_substr(comm0, '[^\[]+', 1, LEVEL), ']' )) AS comm1
           , row_number()OVER(PARTITION BY comm0, trim(rtrim( regexp_substr(comm0, '[^\[]+', 1, LEVEL), ']' ))  ORDER BY LEVEL) rnb
      from v_temp1
      CONNECT BY LEVEL <= regexp_count(comm0, '[^\[]+')
      /*You need to add the following two conditions when processing more than one row*/
      AND PRIOR comm0 = comm0
      AND PRIOR sys_guid() IS NOT NULL
     )
      SELECT comm0, listagg(comm1, '[') WITHIN GROUP (ORDER BY lvl) || 
             LPAD(']', regexp_count(listagg(comm1, '[')WITHIN GROUP (ORDER BY lvl), '\['), ']') comm2
      FROM v_temp2
      WHERE rnb = 1
      GROUP BY comm0
    ;
    

    demo : sql