sqlregexoracle-databaseregexp-substr

How to parse a specific part of string that starts with a specific str to the first space in Oracle


I have dataset like this:

SELECT 
    1 as text_id,
    'The first is A.ACCOUNT_ID and the second one is B.IDENTITY_NO and third one is plate_number .' as full_text
FROM DUAL
UNION
SELECT 
    2,
    'The first is ARC.PREV_RECORD and the second one is ARC.NEXT_RECORD .'
FROM DUAL

I should parse all the phrases starts with " is " ; ends with first space character for each row.

So the result that I want to achieve from full_text is:

text_id parsed_part
1 A.ACCOUNT_ID
1 B.IDENTITY_NO
1 plate_number
2 ARC.PREV_RECORD
2 ARC.NEXT_RECORD

It could be less or more than 3 phrases so the row count of result could be change.

I tried to reverse text first and find the part between " si " and space but couldn't succeed

reverse(regexp_substr(reverse(full_text), ' si ([^_]*) ',1, 1))

Solution

  • With a lateral join on the unfolding of the dotted words via a connect by.

    WITH DATA AS (
    SELECT 1 as textid, 'The first is A.ACCOUNT_ID and the second one is B.IDENTITY_NO and third one is plate_number .' as full_text FROM DUAL
    UNION ALL SELECT 2, 'The first is ARC.PREV_RECORD and the second one is ARC.NEXT_RECORD .' FROM DUAL
    ) 
    SELECT t.textid, w.word
    FROM DATA t
    CROSS JOIN LATERAL (
      SELECT level AS lvl, REGEXP_SUBSTR(full_text, ' is ([A-Z._]+)',1, LEVEL, 'i', 1) AS word 
      FROM DUAL
      CONNECT BY LEVEL <= REGEXP_COUNT(full_text, ' is ([A-Z._]+)', 1, 'i')
    ) w
    ORDER BY t.textid;
    
    TEXTID | WORD           
    -----: | :--------------
         1 | A.ACCOUNT_ID   
         1 | B.IDENTITY_NO  
         1 | plate_number   
         2 | ARC.PREV_RECORD
         2 | ARC.NEXT_RECORD
    

    db<>fiddle here