I have dataset like this:
SELECT
1 as text_id,
'The first is A.ACCOUNT_ID and the second one is B.IDENTITY_NO and third one is plate_number .' as full_text
FROM DUAL
UNION
SELECT
2,
'The first is ARC.PREV_RECORD and the second one is ARC.NEXT_RECORD .'
FROM DUAL
I should parse all the phrases starts with " is " ; ends with first space character for each row.
So the result that I want to achieve from full_text
is:
text_id | parsed_part |
---|---|
1 | A.ACCOUNT_ID |
1 | B.IDENTITY_NO |
1 | plate_number |
2 | ARC.PREV_RECORD |
2 | ARC.NEXT_RECORD |
It could be less or more than 3 phrases so the row count of result could be change.
I tried to reverse text first and find the part between " si " and space but couldn't succeed
reverse(regexp_substr(reverse(full_text), ' si ([^_]*) ',1, 1))
With a lateral join on the unfolding of the dotted words via a connect by
.
WITH DATA AS ( SELECT 1 as textid, 'The first is A.ACCOUNT_ID and the second one is B.IDENTITY_NO and third one is plate_number .' as full_text FROM DUAL UNION ALL SELECT 2, 'The first is ARC.PREV_RECORD and the second one is ARC.NEXT_RECORD .' FROM DUAL ) SELECT t.textid, w.word FROM DATA t CROSS JOIN LATERAL ( SELECT level AS lvl, REGEXP_SUBSTR(full_text, ' is ([A-Z._]+)',1, LEVEL, 'i', 1) AS word FROM DUAL CONNECT BY LEVEL <= REGEXP_COUNT(full_text, ' is ([A-Z._]+)', 1, 'i') ) w ORDER BY t.textid;
TEXTID | WORD -----: | :-------------- 1 | A.ACCOUNT_ID 1 | B.IDENTITY_NO 1 | plate_number 2 | ARC.PREV_RECORD 2 | ARC.NEXT_RECORD
db<>fiddle here