javascriptjison

Parse individual productions in JISON


In JISON, is there a way to parse a string for an individual production? For instance, this primitive parser defines a master expressions in terms of several productions such as ary.

Right now this returns a function that can parse expressions:

var parser = jison.Parser(bnf);
var str = "A(1-3,5)&B(1,2,3)"
var result = parser.parse(str)   // this works 

But i'd like to also parse strings matching individual productions like ary,

var str = "1-3,5"
var result = parser.ary.parse(str)  /// this does not work

Here's an example grammar with some of the Javascript removed:

%start expressions

/* language grammar */
%%

expressions: e EOF {...}
           ;

assign: ID LPAR ary RPAR 
      ;

predicate: COUNT LPAR elist constraint RPAR { ... }
         ;

e : TN        { $$ = {}; }
  | predicate { $$ = $1; }
  | e '&' e   { $$ = _.merge({},$1,$3); }
  | e '!' e   { $$ = { "$or": [$1,$3]}; }
  | '?' e     { $$ = { "$not": $2 };    }
  | '{' e '}' { $$ = $2;                }
  | assign    { $$ = $1;                }
  ;

/* Seguir desde aca. Esta es la unica expr que sigue jodiendo... */
elist: elist SEMI e { ... }
     | e            { ... }
     ;

constraint:  SEMI comparator val { ... }
    | SEMI val {... }
    ;

ary: val { $$ = [$1]; }
   | ary "," val 
    ;


val: NUMBER            { $$ = +$1;                 }
   | '-' NUMBER        { $$ = - (+$2);             }
   | ID                { $$ = $1;                  }
   | STRING            { $$ = $1;                  }
   | NUMBER '-' NUMBER { $$ = _.range(+$1, +$3+1); }
   | '(' ary ')'       { $$ = $1;                  }
   ;

comparator: '$eq'
          | '$lte'
          | '$gte'
          | '$gt'
          | '$lt' { $$ = $1; }
          ;


Solution

  • It's really trivial to add this feature to a parser generated with a parser generator, without much consideration of how the parser generator works. All you need are some extra (fake) terminals, one for each nonterminal you'd like to start the parse with, and one new production for each new terminal.

    It also helps if you can inject a lexeme into the lexical stream without jumping through hoops, which is certainly possible with jison because it allows you to insert your own custom lexer which can inject the terminal and then pass through calls to the generated lexer. (If you need to use the generated lexer, then it's still pretty easy if your lexer generator allows for start conditions. You just define a start condition for each injected lexeme, which immediately issues the desired terminal and then resets to the standard start condition in order to parse the actual input. There's lots of variations for this strategy, depending on the interfaces available to you.)

    Based on the above, the new grammar will look something like:

    start: old_start
         | EXPR_TOKEN expr
         | ARY_TOKEN ary
         | ...
    

    And then you just inject the correct terminal and the parser will automatically continue with the desired non-terminal.

    For parser generators like Jison where the grammar is data, this transformation can easily be done automatically so you don't really need to even decide which non-terminals you're interested in.

    (Note: This lets you select a target non-terminal, not a target production. If you want to target a single production of a non-terminal with more than one production, you'd need to duplicate that production into the new start symbol.)