How to extract specific string until blank space/next line from a text in Oracle?

I am trying to extract the following from the text field using Regrex in Oracle.

For example

"This is example,

and this really a example :h,j,j,j,j,

l //Updated question , as this letter is on the next line

now this is a disease:yes"

I am expecting a result as h,j,j,j,j,l, but if I use

REGEXP_SUBSTR(text_field,'example :[^:]+,') AS Result

I am getting example:h,j,j,j,j But I am not getting the last letter 'l' like above and I am guessing that's because it's on the next line.Also, if I want the string "disease:yes" only, that will be so helpful as well. Thank you much!

Solution

The result you are getting is because your pattern includes the word 'example' and ends with a comma, leaving out the ending 'l'. Try this form instead. Note the example is shown using a Common table Expression (CTE). The WITH statement creates the table called tbl which just sets up test data, kind of like a temp table. This is also a great way to set up data when asking a question. This form of the REGEXP_SUBSTR() function uses a captured group, which is the set of characters after the string 'example:' until the end of that line in the multi-line field. From this you should be able to get the other string you are after. Give it a go.

WITH tbl(text_field) AS (
  SELECT 'This is example,

and this really a example :h,j,j,j,j,l

now this is a disease:yes' FROM dual
)
SELECT REGEXP_SUBSTR(text_field,'example :(.*)', 1, 1, NULL, 1) AS Result
FROM tbl;

RESULT     
-----------
h,j,j,j,j,l

1 row selected.

Edit based on new info. Since that last letter could be on it's own line, you'll need to allow for the newline. Use the 'n' flag to REGEXP_REPLACE() which allows the newline to match in the usage of the dot (match any character) symbol in regex. We switch to REGEXP_REPLACE as we'll need to return multiple capture groups. Here the WITH sets up 2 rows, one with an embedded newline in the data and one without. The capture groups are (going left to right) 1-the data after "example :" and ending in a comma, 2-the optional newline and 3-the next single character. Then replace the entire data with captured groups 1 and 3 (leaving out the newline). NOTE this is very specific to the case of only 1 character on the following line.

WITH tbl(ID, text_field) AS ( 
  SELECT 1, 'This is example,
and this really a example :h,j,j,j,j,
l
now this is a disease:yes'  FROM dual UNION ALL
  SELECT 2, 'This is example,
and this really a example :h,j,j,j,j,l
now this is a disease:yes'  FROM dual
) 
SELECT ID,
       REGEXP_REPLACE(text_field, '.*example :(.*,)('||CHR(10)||')?(.).*', '\1\3', 1, 1, 'n') AS Result 
FROM tbl; 


        ID RESULT      
---------- ------------
         1 h,j,j,j,j,l 
         2 h,j,j,j,j,l 

2 rows selected.