regexsnowflake-cloud-data-platformposix-ere

Extract Substring using Regular expression aka Regex


I have Snowflake POSIX BRE engine and wants to extract below substring from given text. POSIX doesn't support lookbehind and lookahead. Please help with regex expression.

Ex 1: 2022 CKL04 TER-PRO:CPT-REFRESH PRD|NPR Substring Needed: CPT-REFRESH PRD

Ex 2 : 2022 CA4A TER-PRO:CPT-REFRESH PRD Substring Needed: CPT-REFRESH PRD

Ex 3 : 2022 CDDR4A TER-PRO:CPT-LEASING PRD|MC|LQPRI13 Substring Needed: CPT-LEASING PRD

Ex 4 : 2022 CAP04A TER-PRO:PRODUCT|NPR Substring Needed: PRODUCT

Ex 5 : 2022 CS040 TER-PRO:MS-PRD & SVC ANNUAL|NPR Substring Needed: MS-PRD & SVC ANNUAL

I need all of the characters after : and before first | or end of the string if no | is available.

I am able to extract with lookahead and lookbehind using this regex (?<=:).+?(?=||$) but I need solution for POSIX BRE without lookahead and lookbehind


Solution

  • This should work. Logic is to start at : then keep capturing everything after that unless we encounter a |, at which point we just stop there. I am using a capture group so we don't include the preceding :

    select regexp_substr(col,':([^|]+)',1,1,'e');