I have a regex that has multiple match groups.
How in snowflake do I specify which match group to return?
I'm using REGEXP_SUBSTR
but happy to use alternatives if they work better.
TL;DR: Can't do exactly that, but you can the 'e'
option and use non-capturing groups with (?:re)
.
So to clarify, it seems Neil is asking for something that would return word
for
select regexp_substr('bird is the word','(bird) (is) (the) (word)',1,4)
Unfortunately, I don't think Snowflake supports exactly this functionality today. There is an 'e'
(extract) parameter to REGEXP_SUBSTR, which allows you to extract a group only, but it always extracts the first group. The reason for that is that the occurrence
parameter today means occurrence of the entire regexp in the string. Example
select regexp_substr('bird is cows are','([a-z]*) (is|are)',1,2,'e');
=> cows
You can achieve what you want by not using grouping for the groups before what you want, e.g.
select regexp_substr('bird is the word','bird (is) (the) (word)',1,1,'e');
-> is
select regexp_substr('bird is the word','bird is the (word)',1,1,'e');
-> word
However, that doesn't work if you want to use grouping for expressing alternatives, e.g.
select regexp_substr('cow is the word','(bird|cow) is the (word)',1,1,'e');
-> cow
Still, I see there would be value in providing an option to extract a particular group number, will raise it with Snowflake development :)