I have a string that looks like this:
{"id":"1","name":"simpleword","type":"test"},{"id":"123","name":"f23ÜÜ","type":"prod"}
I want to do a REGEXP_SUBSTR_ALL
such that I can extract all "name" values into a list.
Something like this allows me to extract some names but not all. This is because some names include german characters like "Ü" which are not included with '\w+'
SELECT REGEXP_SUBSTR_ALL(ARRAY_TO_STRING(REGEXP_SUBSTR_ALL((ARRAY_TO_STRING(REGEXP_SUBSTR_ALL(ARTICLE_TAGS, '"name\\W+\\w+"'),',')), ':"\\w+'),','),'\\w+') FROM TABLE
For example, the expression above would give me this output:
[
"simpleword"
]
while my desired output is this:
[
"simpleword", "f23ÜÜ"
]
Just match everything that is not a "
:
with data(s) as (
select $${"id":"1","name":"simpleword","type":"test"},{"id":"123","name":"f23ÜÜ","type":"prod"}$$
)
select regexp_substr_all(s, 'name":"([^"]*)"', 1, 1, '', 1)
from data
-- [ "simpleword", "f23ÜÜ" ]
Also an alternative (as discussed in the comments, with the 'e' regex_parameters
:
with data(s) as (
select $${"id":"1","name":"simpleword","type":"test"},{"id":"123","name":"f23ÜÜ","type":"prod"}$$
)
select regexp_substr_all(s, 'name":"([^"]*)"', 1, 1, 'e')
from data