I have a long list of formulas in Microsoft SSIS that I have to translate into SQL queries. All the formulas of my list are basically made with the following rules:
PART_1 ? PART_2 : PART_3
means if PART_1 (test) is true then PART_2, if false then PART_3&&
means AND (for different conditions in PART_1 (test)My target is to extract in Notepad++ the PART_1, PART_2 and PART_3 from most external expressions, like the following examples:
Expression: A>0 ? B : C
A>0
B
C
Expression: A>0 && (A>10 ? A : -A) ? B : C
A>0 && (A>10 ? A : -A)
B
C
Expression: A>0 ? B : (C>14 ? A*14 : -4)
A>0
B
(C>14 ? A*14 : -4)
Expression: A>0 ? (A>0 ? 2-B : C) : (C>14 ? A*14 : -4)
A>0
(A>0 ? 2-B : C)
(C>14 ? A*14 : -4)
In Notepad++ I have made the following REGEX to extract the three parts
PART_1 (test)
.*?\?
PART_2 (if true)
\?.*?:
PART_3 (if false)
:.*
without flag the ". matches newline" option.
The main problem is that they match (of course) everything and not only the most external expressions so I can't use them with nested formulas. How can I fix this problem?
The basic problem comes down to "balanced parentheses", which can be matched with
\((?>[^()]|(?R))*\)
See here:
Regular expression to match balanced parentheses
This expression captures an whole expression into the 3 capture groups
^((?:\((?>[^()]|(?R))*\)|[^?\(])+?)\s*\?\s*((?:\((?>[^()]|(?R))*\)|[^:\(\s])+?)\s*:\s*((?:\((?>[^()]|(?R))*\)|[^\$\s])+)$
Yes, it can be made simpler, but we use this to separate the components into the 3 expressions for the 3 components
condition
" (part 1)^(?:(?:\((?>[^()]|(?R))*\)|[^?\(])+?)(?=\s*\?)
Assumes starts at the beginning of a line.
(?=\s*\?)
is a positive-lookahead.
value_if_true
" (part 2)^((?:\((?>[^()]|(?R))*\)|[^?\(])+?)\s*\?\s*\K((?:\((?>[^()]|(?R))*\)|[^:\(\s])+?)(?=\s*:)
Basically, this one is made up from the "condition" regex then a \K
which throws it away (so it's not captured) and then what we want to match followed by a lookahead.
We use the \K
here because the regex engine (boost) Notepad++ I don't think supports infinite repetition in "positive-lookbehind".
value_if_false
" (part 3):\s*\K((?:\((?>[^()]|(?R))*\)|[^\$\s])+)$
Again using \K
here to throw that bit away.