regexdartcapturing-groupnamed-captures

allMatches not matching several groups exactly


I thought I knew about regex... Here's the situation:

N-U0 U0-M1
M1-T9 T9-R10 R10-E19
E19-L100 L100-B

I have a String that contains groups (let's call them transitions) separated by whitespace (may or may not be line breaks, I'm treating both equally; also, may be one or more characters). Each group is composed of two groups (let's call them exiting and entering) separated by a hyphen. Each of these is composed of either a single character (N or B, respectively) or a specific character and a one-or-many-digits number.

I want to run a regex match that will give me one object for each transition and then, for each object, I want access to each part of the transition by means of named capture groups.

These are the regexes I've written:

static RegExp regex = RegExp(
  r'(?<exitingN>N)|((?<exitingF>[UMTREL]{1})(?<exitingNumber>[0-9]+))-(?<enteringB>B)|((?<enteringF>[UMTREL]{1})(?<enteringNumber>[0-9]+))\s+',
);

static RegExp exitingRegex = RegExp(
  r'(?<exitingN>N)|((?<exitingF>[UMTREL]{1})(?<exitingNumber>[0-9]+))-',
);

static RegExp enteringRegex = RegExp(
  r'-(?<enteringB>B)|((?<enteringF>[UMTREL]{1})(?<enteringNumber>[0-9]+))',
);

When I run

final matchList = regex.allMatches(
  "N-U0 U0-M1\nM1-T9 T9-R10 R10-E19\nE19-L100 L100-B\n",
);

It doesn't work as I expect it to. It matches the first N, then the first U0, then the first M1, and so on until the first L100 and the B. I was expecting it to match N-U0, then U0-M1 and so on. At least matchList.elementAt(0).namedGroup("exitingN") etc works, but I wanted the exiting and the entering parts together.

I tried to add the regex inside another group and I tried both with and without ?: (to make it non-capturing), plus a few other tests, I think, but nothing worked.

Then I tested with exitingRegex only and it worked as expected, matching every exiting. However, enteringRegex didn't work. It matched every exiting and every entering except for N.

The only way I managed to make it work was to match with exitingRegex and then, for the entering, I had to first use "N-U0 U0-M1\nM1-T9 T9-R10 R10-E19\nE19-L100 L100-B\n".replaceAll(exitingRegex, "",) and then match with enteringRegex but without the leading hyphen. This way, I got the exiting and the entering separately, which I have to join later by index.

What's going on?

Thanks in advance.


Solution

  • To limit the branches separated by |, wrap them in a group. This group can be a capturing (()) or non-capturing group ((?:)), depending on what you need. That said, your regex should look like this:

    (?:
      (?<exitingN>N)
    |
      ((?<exitingF>[UMTREL])(?<exitingNumber>[0-9]+))
    )
    -
    (?:
      (?<enteringB>B)
    |
      ((?<enteringF>[UMTREL])(?<enteringNumber>[0-9]+))
    )
    

    For an input of U0-M1, this regex matches and returns the following groups:

    Do note that I removed those unnecessary {1} because an expression always match 1 instance of itself by default.

    Try it on regex101.com.