regexamazon-redshiftamazon-redshift-spectrum

How to extract the values assigned to x from this string?


I am trying to create a regex that I can use to extract out the values assigned to variable x in the following string:

(req.idf=6ca9a AND (req.ster=201 OR req.ster=st_home) AND (req.ste=hi OR req.ster=hijst_iuer OR ((req.ster=laHome OR req.ster=laHome_Jtre) AND (tax=IN OR taxIP=MX))) AND NOT ((x=u259 AND (x=66438 OR x=5423)) OR x=9853))AND(NOT x=28743)

I am trying to get the output in the following format, by getting the numerical values assigned to the variable x: 66438, 5423, 9853, 28743

So far, I have tried creating regex like: (x=[^)]+) and \bx=([^)]+)\b, but I am unable to get the desired output.

I will be using this in my query in Redshift using regexp_substr function.


Solution

  • The fifth argument of REGEXP_SUBSTR enables the usage of PCRE:

    parameters

    One or more string literals that indicate how the function matches the pattern. The possible values are the following:

    [...]

    • p – Interpret the pattern with Perl Compatible Regular Expression (PCRE) dialect.

    This regular expression will allow you to use a positive lookbehind to get what you want:

    (?<=(\s|\()x=)\d+
    

    I've added a preceding '(' or ' ' in case your string is somewhat mutable, but it shouldn't be necessary for your example.

    You can check it here.