In the hope of preventing someone from wasting their time offering an alternative solution I have to use regular expressions for this task.
I am trying to write a regular expression to match a base word that has the prefix "<" (AND OR) the suffix ">" but NOT to match if the base word has neither prefix nor suffix.
This is not a simple case of matching either a "<" or a ">" as this character may change or be part of a group.
Example.
For this example the group of base words are (base|text|word) in real life this list could be quite long.
Out of these candidates in a the input text file...
text
<text
text>
<text>
...I want to match the following...
<text
text>
<text>
...but NOT match...
text
In spoken English my RegEx is looking for any of the base words prefixed with a "<" (AND OR) suffixed with ">" but not to match the base word if it has neither prefix/suffix.
As mentioned above it is not a case of matching a literal "<" or a ">" as these characters may be different or part of a group.
Out of all the attempts I have made I cannot get this to work without catching the base word if it appears alone without a prefix or suffix.
As I became increasingly flustered while working on this problem I failed to retain all my previous attempts. My efforts will be of little value to anyone here as they all failed and when I ran out of ideas I ended up guessing.
The following are some examples.
(text)
= This will catch "text"
(\<)(text)
= This will catch "<text"
(text)(/>)
= This will catch "text>"
(\<)(text)(/>)
= This will catch "<text>"
(\<|)(text)(|/>)
= This is the closest as it will catch "<text" "text>" "<text>" but it will also catch "text".
I have also experimented with look-around and look-behind but I was not able to look-behind and jump over the base word to see if there was a prefix.
The only workaround is to use 2 RegEx. The first Looks for (\<)(text)
and the second looks for (text)(/>)
however this means running the RegEx twice which is inefficient and I really want to solve this problem.
I have been provided with a standalone custom executable (windows) to run these RegEx's and I have no idea what RegEx engine it uses but common RegEx commands seem to work ok.
Thank you and any help would be gratefully received.
You can use
(<)?text(?(1)>?|>)
See the regex demo.
Details:
(<)?
- Group 1 (optional): matches a <
optionallytext
- matches a text
string(?(1)>?|>)
- a conditional construct: if Group 1 matched an optional >
char is matched, else, a >
must be matched.If you need to use word boundaries, use them like in
(<)?\btext\b(?(1)>?|>)