parsingwolfram-mathematicabnfbnfc

BNFC parser and bracket Mathematica like syntax


I played a bit with the BNF Converter and tried to re-engineer parts of the Mathematica language. My BNF had already about 150 lines and worked OK, until I noticed a very basic bug. Brackets [] in Mathematica are used for two different things

  1. expr[arg] to call a function
  2. list[[spec]] to access elements of an expression, e.g. a List

Let's assume I want to create the parser for a language which consists only of identifiers, function calls, element access and sequence of expressions as arguments. These forms would be valid

f[]
f[a]
f[a,b,c]
f[[a]]
f[[a,b]]

f[a,f[b]]
f[[a,f[x]]]

A direct, but obviously wrong input-file for BNFC could look like

entrypoints Expr ;

TSymbol.        Expr1 ::= Ident ;
FunctionCall.   Expr ::= Expr "[" [Sequence] "]" ;
Part.           Expr ::= Expr "[[" [Sequence] "]]" ;    
coercions Expr 1 ;

separator Sequence "," ;
SequenceExpr. Sequence ::= Expr ;

This BNF does not work for the last two examples of the first code-block.

The problem seems to be located in the created Yylex lexer file, which matches ] and ]] separately. This is wrong, because as can be seen in the last to examples, whether or not it's a closing ] or ]] depends on the context. So either you have to create a stack of braces to ensure the right matching or you leave that to the parser.

Can someone enlighten me whether it's possible to realize this with BNFC?

(Btw, other hints would be gratefully taken too)


Solution

  • Your problem is the token "]]". If the lexer collects this without having any memory of its past, it might be mistaken. So just don't do that!

    The parser by definition remembers its left context, so you can get it to do the bracket matching correctly.

    I would define your grammar this way:

    FunctionCall.   Expr ::= Expr "[" [Sequence] "]" ;
    Part.           Expr ::= Expr "[" "[" [Sequence] "]" "]" ;   
    

    with the lexer detecting only single "[" "]" as tokens.

    An odd variant:

    FunctionCall.   Expr ::= Expr "[" [Sequence] "]" ;
    Part.           Expr ::= Expr "[[" [Sequence] "]" "]" ; 
    

    with the lexer also detecting "[[" as a token, since it can't be mistaken.