ISO 5234 says:
https://datatracker.ietf.org/doc/html/rfc5234#section-3.2
Elements separated by a forward slash ("/") are alternatives. Therefore,
foo / bar
will accept <foo> or <bar>.
Using ISO 3339 for the ABNF definition for ISO 8601 datetime you have lines like
https://datatracker.ietf.org/doc/html/rfc3339#appendix-A
date-century = 2DIGIT date-decade = DIGIT ; 0-9 date-subdecade = DIGIT ; 0-9 date-year = date-decade date-subdecade dateopt-century = "-" / date-century datespec-year = date-century / dateopt-century date-year
If you assume alternatives imply match first, 2024
would only match on 20
and ignore the rest. If you assume alternatives imply match longest, 2024
would match in it's entirety.
Based on that, it seems almost safe to assume that alternatives are indeed match longest. Is that correct?
Is the alternative operator in ABNF match first or longest?
It's neither.
BNF and its variants specify context free grammars. Unlike parsing expression grammars, context free grammars do not have a concept of ordered choice. There's no priority between alternatives, you choose whichever one leads to a successful parse.
So let's say you have the following production:
start = foo bar
foo = "A" / "AB"
bar = "C" / "BD"
If we went with a "match first" strategy, the input "ABBD" would not be accepted because foo
would only match "A" and the "BBD" left over could not be matched by bar
. That's what would happen with a PEG, but for a CFG, "ABBD" is a word in the language defined by the above grammar.
If we went with a "match longest" strategy, the input "ABD" would not be accepted because foo
would match "AB", leaving a single "D" that couldn't be matched by bar
. But again, "ABD" is part of the language, so we can't use that strategy either.
The rule for context free grammars is that a word is part of the language if there exists any sequence of derivations that would derive the word from the grammar and there's no restrictions on which alternatives you're allowed to pick to come up with such a sequence.