oracleregexp-replace

Oracle regexp_replace


I’m currently stuck on the regexp_replace to make a working pattern for updating thousands of rows…

Strings will look like this:

nvl(replace('$$TAG1$$$$TAG2$$$$TAG3$$' …) … $$TAG4$$ = '12345678' … INSTR('product1, product2,',',$$TAG5$$,') … GET_CODE($$TAG6$$, 'something')

I modified string a bit, but key parts stay the same.

I need to replace (add quote encasing) only to those $$TAG$$ which are not in quotes yet. Those would be &&TAG6&& and $$TAG4$$

Current pattern:

SELECT REGEXP_REPLACE(
    q'{nvl(replace('$$TAG1$$$$TAG2$$$$TAG3$$' …) … $$TAG4$$ = '12345678' … INSTR('product1, product2,',',$$TAG5$$,') … GET_CODE($$TAG6$$, 'something')}',
    q'{(\'\${2}.*\${4}.*\${2}\')|(',\${2}[A-Za-z0-9_/i]+\${2},')|(\${2}[A-Za-z0-9_]+?\${2})}',
    q'{\1\2'\3'}'
) AS result
FROM dual;

Result:

nvl(replace('$$TAG1$$$$TAG2$$$$TAG3$$''' …) … '$$TAG4$$' = '12345678' … INSTR('product1, product2,',',$$TAG5$$,''') … GET_CODE('$$TAG6$$', 'something')

As you can see, after TAG3, and TAG5, appears two additional quotes..

Please help:)


Solution

    1. Split the string into tokens using a row-generator;
    2. Get a running count of the quotes;
    3. If a tag is not inside a quote then surround it by quotes; and
    4. Aggregate the string.

    Like this:

    SELECT s.*
    FROM   table_name t
           CROSS JOIN LATERAL (
             SELECT LISTAGG(
                      CASE
                      WHEN term LIKE '$$%$$'
                      AND  MOD(num_quotes,2) = 0
                      THEN '''' || term || ''''
                      ELSE term
                      END,
                      NULL
                    ) WITHIN GROUP (ORDER BY pos) AS value
             FROM   (
               SELECT REGEXP_SUBSTR(t.value, q'!'|\$\$.*?\$\$|(\$?[^'$]+)+!', 1, LEVEL) AS term,
                      COUNT(
                        CASE
                        WHEN REGEXP_SUBSTR(t.value, q'!'|\$\$.*?\$\$|(\$?[^'$]+)+!', 1, LEVEL) = ''''
                        THEN 1
                        END
                      ) OVER (ORDER BY LEVEL) AS num_quotes,
                      LEVEL AS pos
               FROM   DUAL
               CONNECT BY LEVEL <= REGEXP_COUNT(t.value, q'!'|\$\$.*?\$\$|(\$?[^'$]+)+!')
             )
           ) s;
    

    Which, for the sample data:

    CREATE TABLE table_name (value) AS
      SELECT q'{nvl(replace('$$TAG1$$$$TAG2$$$$TAG3$$' ...) ... $$TAG4$$ = '12345678' ... INSTR('product1, product2,',',$$TAG5$$,') ... GET_CODE($$TAG6$$, 'something')}' FROM DUAL;
    

    Outputs:

    VALUE
    nvl(replace('$$TAG1$$$$TAG2$$$$TAG3$$' ...) ... '$$TAG4$$' = '12345678' ... INSTR('product1, product2,',',$$TAG5$$,') ... GET_CODE('$$TAG6$$', 'something')

    fiddle