regexmatchprefixsuffix

Conditional RegEx to match prefix (and or) suffix but not a word with neither


In the hope of preventing someone from wasting their time offering an alternative solution I have to use regular expressions for this task.

I am trying to write a regular expression to match a base word that has the prefix "<" (AND OR) the suffix ">" but NOT to match if the base word has neither prefix nor suffix.

This is not a simple case of matching either a "<" or a ">" as this character may change or be part of a group.

Example.

For this example the group of base words are (base|text|word) in real life this list could be quite long.

Out of these candidates in a the input text file...

text
<text
text>
<text>

...I want to match the following...

<text
text>
<text>

...but NOT match...

text

In spoken English my RegEx is looking for any of the base words prefixed with a "<" (AND OR) suffixed with ">" but not to match the base word if it has neither prefix/suffix.

As mentioned above it is not a case of matching a literal "<" or a ">" as these characters may be different or part of a group.

Out of all the attempts I have made I cannot get this to work without catching the base word if it appears alone without a prefix or suffix.

As I became increasingly flustered while working on this problem I failed to retain all my previous attempts. My efforts will be of little value to anyone here as they all failed and when I ran out of ideas I ended up guessing.

The following are some examples.

(text) = This will catch "text"

(\<)(text) = This will catch "<text"

(text)(/>) = This will catch "text>"

(\<)(text)(/>) = This will catch "<text>"

(\<|)(text)(|/>) = This is the closest as it will catch "<text" "text>" "<text>" but it will also catch "text".

I have also experimented with look-around and look-behind but I was not able to look-behind and jump over the base word to see if there was a prefix.

The only workaround is to use 2 RegEx. The first Looks for (\<)(text) and the second looks for (text)(/>) however this means running the RegEx twice which is inefficient and I really want to solve this problem.

I have been provided with a standalone custom executable (windows) to run these RegEx's and I have no idea what RegEx engine it uses but common RegEx commands seem to work ok.

Thank you and any help would be gratefully received.


Solution

  • You can use

    (<)?text(?(1)>?|>)
    

    See the regex demo.

    Details:

    If you need to use word boundaries, use them like in

    (<)?\btext\b(?(1)>?|>)