parsingabnf

Is the alternative operator in ABNF commutative?


Is the alternative operator (/) in Augmented Backus-Naur Form commutative?

For example, is s = a / b the same as s = b / a?


Solution

  • I haven't found any primary sources on BNF or ABNF which explicitly specify / semantics when both sides would yield valid matches. They don't allude to context-free grammars and their allowance for non-determinism either. If anyone knows of clarifying references please share.

    EDIT: Tony's answer points out RFC 3501 from 2003 specifies the semantics of ABNF alternation, at least as it's used in that document.

    RFC 5234: Augmented BNF for Syntax Specifications: ABNF (2008)

    The introduction contrasts BNF and ABNF (with emphasis added here):

    Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications. It balances compactness and simplicity with reasonable representational power. In the early days of the Arpanet, each specification contained its own definition of ABNF. This included the email specifications, RFC733 and then RFC822 , which came to be the common citations for defining ABNF. The current document separates those definitions to permit selective reference.

    The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges.

    "Selective reference" and "order-independence" may relate to alternation ordering semantics, but it's unclear.

    RFC 822: Standard for the Format of ARPA Internet Text Messages (1982)

    Unless I'm missing something, the cited RFCs don't specify / semantics either. Section 2.2 evades the problem.

    2.2. RULE1 / RULE2: ALTERNATIVES

    Elements separated by slash ("/") are alternatives. There- fore "foo / bar" will accept foo or bar.

    Various rule definitions show they recognize the practical importance of avoiding ambiguity. For example, here's how RFC 822 defines optional-field and its dependencies:

     optional-field =
                 /  "Message-ID"        ":"   msg-id
                 /  "Resent-Message-ID" ":"   msg-id
                 /  "In-Reply-To"       ":"  *(phrase / msg-id)
                 /  "References"        ":"  *(phrase / msg-id)
                 /  "Keywords"          ":"  #phrase
                 /  "Subject"           ":"  *text
                 /  "Comments"          ":"  *text
                 /  "Encrypted"         ":" 1#2word
                 /  extension-field              ; To be defined
                 /  user-defined-field           ; May be pre-empted
    
     extension-field =
                   <Any field which is defined in a document
                    published as a formal extension to this
                    specification; none will have names beginning
                    with the string "X-">
     
     user-defined-field =
                   <Any field which has not been defined
                    in this specification or published as an
                    extension to this specification; names for
                    such fields must be unique and may be
                    pre-empted by published extensions>
    

    The Syntax and Semantics of the Proposed International Algebraic Language of the Zurich ACM-GAMM Conference (Backus 1958)

    BNF comes from IAL notation. The paper introduces ̅o̅r "metalinguistic connective", which is intuitively related to /. However, it also dodges the ambiguous choice problem and presumably just uses it carefully.

    Recommendation

    Due to unspecified semantics my suggestion is to treat every possible match in an alternation rule as valid. When the grammar isn't carefully designed to avoid ambiguity this interpretation can result in multiple valid parse trees for the same input. Addressing ambiguous parses as they occur is safer than forging ahead with an unintentionally valid parse tree.

    Alternatively, if you have influence over how the grammar is specified you could consider a notation with clearer semantics. For example, Parsing Expression Grammar: A Recognition-Based Syntactic Foundation (Ford 2004) gives alternatives deterministic prioritized choice semantics (left-most match wins).