jison

JISON: How do I avoid "dog" being parsed as "do"?


I have the following JISON file (lite version of my actual file, but reproduces my problem):

%lex

%%

"do"                        return 'DO';
[a-zA-Z_][a-zA-Z0-9_]*      return 'ID';
"::"                        return 'DOUBLECOLON'
<<EOF>>                     return 'ENDOFFILE';

/lex

%%

start
    : ID DOUBLECOLON ID ENDOFFILE
    {$$ = {type: "enumval", enum: $1, val: $3}}
    ;

It is for parsing something like "AnimalTypes::cat". It works fine for things like "AnimalTypes::cat", but the when it sees dog instead of cat, it asumes it's a DO instead of an id. I can see why it does that, but how do I get around it? I've been looking at other JISON documents, but can't seem to spot the difference that (I assume) makes those work.

This is the error I get:

JisonParserError: Parse error on line 1:
PetTypes::dog
----------^
Expecting "ID", "enumstr", "id", got unexpected "DO"

Repro steps:

  1. Install jison-gho globally from npm (or modify code to use local version). I use Node v14.6.0.
  2. Save the JISON above as minimal-repro.jison
  3. Run: jison -m es -o ./minimal.mjs ./minimal-repro.jison to create parser
  4. Create a file named test.mjs with code like:
import Parser from "./minimal.mjs";
Parser.parser.parse("PetTypes::dog")
  1. Run node test.mjs

Edit: Updated with a reproducible example. Edit2: Simpler JISON


Solution

  • Unlike (f)lex, the jison lexer accepts the first matching pattern, even if it is not the longest matching pattern. You can get the (f)lex behaviour by using

     %option flex
    

    However, that significantly slows down the scanner.

    The original jison automatically added \b to the end of patterns which ended with a literal string matching an alphabetic character, to make it easier to match keywords without incurring this overhead. In jison-gho, this feature was turned off unless you specify

     %option easy_keyword_rules
    

    See https://github.com/zaach/jison/wiki/Deviations-From-Flex-Bison#user-content-literal-tokens.

    So either of those options will achieve the behaviour you expect.