[SOLVED] regex expression to include german characters

regex expression to include german characters

I have a string that looks like this:

{"id":"1","name":"simpleword","type":"test"},{"id":"123","name":"f23ÜÜ","type":"prod"}

I want to do a REGEXP_SUBSTR_ALL such that I can extract all "name" values into a list.

Something like this allows me to extract some names but not all. This is because some names include german characters like "Ü" which are not included with '\w+'

SELECT REGEXP_SUBSTR_ALL(ARRAY_TO_STRING(REGEXP_SUBSTR_ALL((ARRAY_TO_STRING(REGEXP_SUBSTR_ALL(ARTICLE_TAGS, '"name\\W+\\w+"'),',')), ':"\\w+'),','),'\\w+') FROM TABLE

For example, the expression above would give me this output:

[
    "simpleword"
]

while my desired output is this:

[
    "simpleword", "f23ÜÜ"
]

Solution

Just match everything that is not a ":

with data(s) as (
    select $${"id":"1","name":"simpleword","type":"test"},{"id":"123","name":"f23ÜÜ","type":"prod"}$$
)

select regexp_substr_all(s, 'name":"([^"]*)"', 1, 1, '', 1)
from data

-- [   "simpleword",   "f23ÜÜ" ]

Also an alternative (as discussed in the comments, with the 'e' regex_parameters:

with data(s) as (
    select $${"id":"1","name":"simpleword","type":"test"},{"id":"123","name":"f23ÜÜ","type":"prod"}$$
)

select regexp_substr_all(s, 'name":"([^"]*)"', 1, 1, 'e')
from data