javascriptregexnon-greedy

How to make \w token act non-greedy in this RegEx?


I have a text string which contains a repeating pattern, each repetion separated by the next by the . (dot) character. The pattern may end in a _123 (underscore followed by a sequence of digits), and I want to catch those digits in a dedicated capturing group.

The RegEx (ECMAScript) I have built mostly works:
https://regex101.com/r/iEzalU/1

/(label(:|\+))?(\w+)(?:_(\d+))?/gi

However, the (\w+) part acts greedy, and overtakes the (?:_(\d+))? part.

Regex with Greedy behavior

Adding a ? to make \w+ non-greedy (\w+?) works, but now I have a capturing token for each character matched by \w

Regex with non-greedy behavior

How can I make this regex such that \w+ acts greedy but still does not overtake the _(\d+) part?
Otherwise, is it possible to capture all tokens matched by the non-greedy \w+?, as a single match? (some capturing/non-capturing groups magic?)


Solution

  • When creating regular expressions, it is a good idea to think about your expected match boundaries.

    You know you need to match substrings in a longer string, so $ and \z can be excluded at once. Digits, letters, underscores are all word characters matched with \w, so you want to match all up to a character other than a word character (or, potentially, till the end of string).

    I suggest using

    (label[:+])?(\w+?)(?:_(\d+))?\b
    

    See the regex demo

    Details: