sqloracleoracle19cregexp-substr

Split string using REGEXP_SUBSTR in Oracle SQL gone wrong


Hello i have 3 strings that i want to split and they are all separated with delimiter

string_1 contains names of fields for example: ID, DUE_DATE
string_2 contains values of fields for example: 80781,2026-12-01
string_3 contains types of fields for example:  VARCHAR2,DATE
string_4 contains format of fields for example:  yyyy-mm-dd (for date field)

so i made this query

                with params as (
                  select  'ID|DUE_DATE'         as v_names
                         ,'80781|2026-12-01'    as v_values
                         ,'VARCHAR2|DATE'       as v_types
                         ,'|yyyy-mm-dd'         as v_types_format
                         ,'|' as v_delimiter
                  from dual
                )
                SELECT
                   REGEXP_SUBSTR(v_names, '[^ '||v_delimiter||']+', 1, level) AS v_name
                  ,REGEXP_SUBSTR(v_values, '[^ '||v_delimiter||']+', 1, level) AS v_value
                  ,REGEXP_SUBSTR(v_types, '[^ '||v_delimiter||']+', 1, level) AS v_type
                  ,REGEXP_SUBSTR(v_types_format, '[^ '||v_delimiter||']+', 1, level) as v_format
                FROM params
                CONNECT BY REGEXP_SUBSTR(v_names, '[^ '||v_delimiter||']+', 1, level) IS NOT NULL

I want to get this result ( v_types_format string should bring me null for ID and yyyy-mm-dd for DUE_DATE) Result i want

But i get this result

Wrond result


Solution

  • You are not matching zero-width terms as [^|]+ matches one-or-more non-pipe characters and in |yyyy-mm-dd the first match is after the pipe and not before.

    If you want to match zero-width terms then match each delimiter and the extract the preceding characters since the last match using (.*?)[|] and to match the last match (which won't have a pipe suffix but will have the end of the string) you can use (.*?)$ which you can combine to (.*?)([|]|$).

    Like this:

    with params as (
      select  'ID|DUE_DATE'         as v_names
             ,'80781|2026-12-01'    as v_values
             ,'VARCHAR2|DATE'       as v_types
             ,'|yyyy-mm-dd'         as v_types_format
             ,'|' as v_delimiter
      from dual
    )
    SELECT REGEXP_SUBSTR(v_names, '(.*?)(['||v_delimiter||']|$)', 1, level, NULL, 1)
             AS v_name
          ,REGEXP_SUBSTR(v_values, '(.*?)(['||v_delimiter||']|$)', 1, level, NULL, 1)
             AS v_value
          ,REGEXP_SUBSTR(v_types, '(.*?)(['||v_delimiter||']|$)', 1, level, NULL, 1)
             AS v_type
          ,REGEXP_SUBSTR(v_types_format, '(.*?)(['||v_delimiter||']|$)', 1, level, NULL, 1)
             as v_format
    FROM   params
    CONNECT BY LEVEL < REGEXP_COUNT(v_names, '(.*?)(['||v_delimiter||']|$)')
    

    Which outputs:

    V_NAME V_VALUE V_TYPE V_FORMAT
    ID 80781 VARCHAR2 null
    DUE_DATE 2026-12-01 DATE yyyy-mm-dd

    fiddle