grammarisopegabnf

Is the alternative operator in ABNF match first or longest?


ISO 5234 says:

https://datatracker.ietf.org/doc/html/rfc5234#section-3.2

Elements separated by a forward slash ("/") are alternatives. Therefore,

     foo / bar

will accept <foo> or <bar>.

Using ISO 3339 for the ABNF definition for ISO 8601 datetime you have lines like

https://datatracker.ietf.org/doc/html/rfc3339#appendix-A

   date-century      = 2DIGIT
   date-decade       =  DIGIT  ; 0-9
   date-subdecade    =  DIGIT  ; 0-9
   date-year         = date-decade date-subdecade
   dateopt-century   = "-" / date-century
   datespec-year     = date-century / dateopt-century date-year

If you assume alternatives imply match first, 2024 would only match on 20 and ignore the rest. If you assume alternatives imply match longest, 2024 would match in it's entirety.

Based on that, it seems almost safe to assume that alternatives are indeed match longest. Is that correct?


Solution

  • Is the alternative operator in ABNF match first or longest?

    It's neither.

    BNF and its variants specify context free grammars. Unlike parsing expression grammars, context free grammars do not have a concept of ordered choice. There's no priority between alternatives, you choose whichever one leads to a successful parse.

    So let's say you have the following production:

    start = foo bar
    foo = "A" / "AB"
    bar = "C" / "BD"
    

    If we went with a "match first" strategy, the input "ABBD" would not be accepted because foo would only match "A" and the "BBD" left over could not be matched by bar. That's what would happen with a PEG, but for a CFG, "ABBD" is a word in the language defined by the above grammar.

    If we went with a "match longest" strategy, the input "ABD" would not be accepted because foo would match "AB", leaving a single "D" that couldn't be matched by bar. But again, "ABD" is part of the language, so we can't use that strategy either.

    The rule for context free grammars is that a word is part of the language if there exists any sequence of derivations that would derive the word from the grammar and there's no restrictions on which alternatives you're allowed to pick to come up with such a sequence.