Basically, I have an input String in PL/SQL that looks like the following:
input varchar2(100) := R985AD5768N2
This string is composed of several information preceded by a certain prefix each and concatenating each block. The prefix list is variable, in this example it is R
, D5
, G
and N
. Information may contain numbers and characters as long as there is no character sequence that is a prefix. Also, some prefixes may not necessarily appear inside the input String.
My goal is to use REGEXP_SUBSTR to filter the information of each block by its prefix, so having the following results from the example above:
R -> 985A
D5 -> 768
G -> NULL
N -> 2
My general idea is to use a pattern of the form 'R([^R|D5|G|N]+)(R|D5|G|N)?'
for R
respectively, meaning I require my sequence to start with the character R, then using all following characters until the next prefix (or the end of the string) is reached. With the subexpr argument of REGEXP_SUBSTR I can then directly choose the second part, being the information I am looking for.
I tried several approaches:
REGEXP_SUBSTR(input, 'R([^R|D5|G|N]+)(R|D5|G|N)?', 1, 1, 'c', 1) -- and other prefixes accordingly
is fine except for R
which results in only 98, I guess because 5 is appearing in the D5 prefix.
REGEXP_SUBSTR(input, 'R(\w+)(R|D5|G|N)', 1, 1, 'c', 1)
Using the positive approach, it will choose the longest approach, meaning the whole remainder of the input after its respective prefix.
I tried several variations of these, like replacing some brackets but still it seems like one key flaw is that the D5
prefix is not recognized as a literal in its respective bracket.
I saw some approaches that use some lookahead parameters but sadly these are not supported, as far as I know.
Any ideas on how to clarify this?
If you have specific terms and you want to find the first occurrence of each of them then you can hard-code those prefixes and use a non-greedy match looking for either a trailing prefix or an end-of-string:
SELECT value,
REGEXP_SUBSTR(value, '^R(.*?)(R|G|N|D5|$)', 1, 1, NULL, 1) AS r,
REGEXP_SUBSTR(value, 'G(.*?)(R|G|N|D5|$)', 1, 1, NULL, 1) AS g,
REGEXP_SUBSTR(value, 'N(.*?)(R|G|N|D5|$)', 1, 1, NULL, 1) AS n,
REGEXP_SUBSTR(value, 'D5(.*?)(R|G|N|D5|$)', 1, 1, NULL, 1) AS d5
FROM sample_data
Which, for the sample data:
CREATE TABLE sample_data (value) AS
SELECT 'R985AD5768N2' FROM DUAL;
Outputs:
VALUE | R | G | N | D5 |
---|---|---|---|---|
R985AD5768N2 | 985A | null | 2 | 768 |
Or, equivalently in PL/SQL:
DECLARE
value VARCHAR2(20) := 'R985AD5768N2';
r VARCHAR2(20) := REGEXP_SUBSTR(value, '^R(.*?)(R|G|N|D5|$)', 1, 1, NULL, 1);
g VARCHAR2(20) := REGEXP_SUBSTR(value, 'G(.*?)(R|G|N|D5|$)', 1, 1, NULL, 1);
n VARCHAR2(20) := REGEXP_SUBSTR(value, 'N(.*?)(R|G|N|D5|$)', 1, 1, NULL, 1);
d5 VARCHAR2(20) := REGEXP_SUBSTR(value, 'D5(.*?)(R|G|N|D5|$)', 1, 1, NULL, 1);
BEGIN
DBMS_OUTPUT.PUT_LINE( 'R = ' || r);
DBMS_OUTPUT.PUT_LINE( 'G = ' || g);
DBMS_OUTPUT.PUT_LINE( 'N = ' || n);
DBMS_OUTPUT.PUT_LINE( 'D5 = ' || d5);
END;
/
Outputs:
R = 985A G = N = 2 D5 = 768
If you want all the prefixes and suffixes in the order they occur then you can use a recursive query and look for only the prefixes and find the start and end positions of those and then the suffixes will be the sub-strings between them.
WITH bounds (value, sspos, sepos, espos, eepos) AS (
SELECT value,
1,
2,
REGEXP_INSTR(value, 'R|G|N|D5', 2, 1, 0),
REGEXP_INSTR(value, 'R|G|N|D5', 2, 1, 1)
FROM sample_data
WHERE value LIKE 'R%'
UNION ALL
SELECT value,
espos,
eepos,
REGEXP_INSTR(value, 'R|G|N|D5', eepos, 1, 0),
REGEXP_INSTR(value, 'R|G|N|D5', eepos, 1, 1)
FROM bounds
WHERE eepos > 0
)
SELECT SUBSTR(value, sspos, sepos - sspos) AS prefix,
CASE WHEN espos = 0
THEN SUBSTR(value, sepos)
ELSE SUBSTR(value, sepos, espos - sepos)
END AS suffix
FROM bounds
Which outputs:
PREFIX | SUFFIX |
---|---|
R | 985A |
D5 | 768 |
N | 2 |