I thought I knew about regex... Here's the situation:
N-U0 U0-M1
M1-T9 T9-R10 R10-E19
E19-L100 L100-B
I have a String
that contains groups (let's call them transitions
) separated by whitespace (may or may not be line breaks, I'm treating both equally; also, may be one or more characters). Each group is composed of two groups (let's call them exiting
and entering
) separated by a hyphen. Each of these is composed of either a single character (N
or B
, respectively) or a specific character and a one-or-many-digits number.
I want to run a regex match that will give me one object for each transition
and then, for each object, I want access to each part of the transition
by means of named capture groups.
These are the regexes I've written:
static RegExp regex = RegExp(
r'(?<exitingN>N)|((?<exitingF>[UMTREL]{1})(?<exitingNumber>[0-9]+))-(?<enteringB>B)|((?<enteringF>[UMTREL]{1})(?<enteringNumber>[0-9]+))\s+',
);
static RegExp exitingRegex = RegExp(
r'(?<exitingN>N)|((?<exitingF>[UMTREL]{1})(?<exitingNumber>[0-9]+))-',
);
static RegExp enteringRegex = RegExp(
r'-(?<enteringB>B)|((?<enteringF>[UMTREL]{1})(?<enteringNumber>[0-9]+))',
);
When I run
final matchList = regex.allMatches(
"N-U0 U0-M1\nM1-T9 T9-R10 R10-E19\nE19-L100 L100-B\n",
);
It doesn't work as I expect it to. It matches the first N
, then the first U0
, then the first M1
, and so on until the first L100
and the B
. I was expecting it to match N-U0
, then U0-M1
and so on. At least matchList.elementAt(0).namedGroup("exitingN")
etc works, but I wanted the exiting
and the entering
parts together.
I tried to add the regex inside another group and I tried both with and without ?:
(to make it non-capturing), plus a few other tests, I think, but nothing worked.
Then I tested with exitingRegex
only and it worked as expected, matching every exiting
. However, enteringRegex
didn't work. It matched every exiting
and every entering
except for N
.
The only way I managed to make it work was to match with exitingRegex
and then, for the entering
, I had to first use "N-U0 U0-M1\nM1-T9 T9-R10 R10-E19\nE19-L100 L100-B\n".replaceAll(exitingRegex, "",)
and then match with enteringRegex
but without the leading hyphen. This way, I got the exiting
and the entering
separately, which I have to join later by index.
What's going on?
Thanks in advance.
To limit the branches separated by |
, wrap them in a group. This group can be a capturing (()
) or non-capturing group ((?:)
), depending on what you need. That said, your regex should look like this:
(?:
(?<exitingN>N)
|
((?<exitingF>[UMTREL])(?<exitingNumber>[0-9]+))
)
-
(?:
(?<enteringB>B)
|
((?<enteringF>[UMTREL])(?<enteringNumber>[0-9]+))
)
For an input of U0-M1
, this regex matches and returns the following groups:
U0-M1
U0
exitingF
: U
exitingNumber
: 0
Do note that I removed those unnecessary {1}
because an expression always match 1 instance of itself by default.
Try it on regex101.com.