plsqloracle19cregexp-replace

Remove empty params from URL


URL like this

https://test.com/test?username=john&x1=1234&x2=5677

I want to remove params if any of them looks like param=.

So for example:

I almost achieve this by doing positive lookahead, but Oracle does not support it.

Any ideas?


Solution

    1. Split the string into the path and query parameter components;
    2. Double up the ampersands in the query parameter component and prepend and append an ampersand so that each query parameter is surrounded by its own ampersand;
    3. Replace any query parameter that matches the regular expression &[^&]+=&
    4. Remove the duplicated ampersands and the leading and trailing ampersands from the remaining query parameters.
    5. Concatenate the path component with the remaining query parameters.

    Like this:

    SELECT column_name,
           SUBSTR(column_name, 1, INSTR(column_name, '?'))
           ||
           TRIM(
             BOTH '&' FROM
             REPLACE(
               REGEXP_REPLACE(
                 '&'
                 || REPLACE(
                   SUBSTR(column_name, INSTR(column_name, '?') + 1),
                   '&',
                   '&&'
                 )
                 || '&',
                 '&[^&]+=&'
               ),
               '&&',
               '&'
             )
           ) AS without_empty
    FROM   table_name
    

    Which, for the sample data:

    CREATE TABLE table_name (column_name) AS
      SELECT 'https://test.com/test?username=&x1=1234&x2=5677' FROM DUAL UNION ALL
      SELECT 'https://test.com/test?username=john&x1=1234&x2=' FROM DUAL UNION ALL
      SELECT 'https://test.com/test?a=&b=&c=' FROM DUAL
    

    Outputs:

    COLUMN_NAME WITHOUT_EMPTY
    https://test.com/test?username=&x1=1234&x2=5677 https://test.com/test?x1=1234&x2=5677
    https://test.com/test?username=john&x1=1234&x2= https://test.com/test?username=john&x1=1234
    https://test.com/test?a=&b=&c= https://test.com/test?

    If you want it in PL/SQL then just use exactly the same code (and could wrap it in a function if it makes it simpler):

    DECLARE
      FUNCTION simplify_url(i_url VARCHAR2) RETURN VARCHAR2
      IS
      BEGIN
        RETURN  SUBSTR(i_url, 1, INSTR(i_url, '?'))
           ||
           TRIM(
             BOTH '&' FROM
             REPLACE(
               REGEXP_REPLACE(
                 '&'
                 || REPLACE(
                   SUBSTR(i_url, INSTR(i_url, '?') + 1),
                   '&',
                   '&&'
                 )
                 || '&',
                 '&[^&]+=&'
               ),
               '&&',
               '&'
             )
           );
      END;
    BEGIN
      FOR r IN (SELECT column_name FROM table_name) LOOP
        DBMS_OUTPUT.PUT_LINE('ORIGINAL: ' || r.column_name);
        DBMS_OUTPUT.PUT_LINE('SIMPLIFIED: ' || simplify_url(r.column_name));
      END LOOP;
    END;
    /
    

    Which outputs:

    ORIGINAL: https://test.com/test?username=&x1=1234&x2=5677
    SIMPLIFIED: https://test.com/test?x1=1234&x2=5677
    ORIGINAL: https://test.com/test?username=john&x1=1234&x2=
    SIMPLIFIED: https://test.com/test?username=john&x1=1234
    ORIGINAL: https://test.com/test?a=&b=&c=
    SIMPLIFIED: https://test.com/test?
    

    fiddle