javascriptparsingcontext-free-grammarlexernearley

Why is nearley-unparse not including tokens in sample strings generated from a compiled Nearley grammar when using Moo as tokenizer/lexer


I'm not sure whether this a problem with the Nearley.js library, the Moo tokenizer/lexer or with my own code. So I might need to submit this as an issue to the Nearley repo. All the referenced files can be found in this Gist.

I am attempting to write a Nearley grammar that will parse a list of homework problems for one of my classes. The problems are in problems.txt and look like this:

Section 5.2 (Due 4/23)- #3, 5*, 8*, 9, 11, 14*, 15, 17*, 18*, 20, 21*, 22*, 24*, 25 (see example 5, not discussed in class)
Section 5.3 (Due 4/30)- #1, 3*, 4, 5, 6*, 7, 9*, 11, 13*, 16, 20*, 21*, 22*, 23, 24*, 25*, 27, 28*, 31, 32

That's just two lines as an example, whole file is larger.

The Nearley grammar I wrote is in problems-grammar.ne here and I'm not entirely finished yet. I'm using the Moo tokenizer/lexer according to these instructions in the Nearley docs.

I'm currently testing my grammar by using the nearley-unparse command as explained here using this command, where problems-grammar.js is the parser compiled by Nearley.

nearley-unparse problems-grammar.js -o test.txt

Unfortunately, the unparser doesn't seem to be correctly generating grammars with examples of the tokens, apart from the newline token. Here is one output of nearley-unparse:

Section  (Due )- #*, , 
Section  (Due )- #, *, 
Section  (Due )- #*, , , *, 
Section  (Due )- #*, *
Section  (Due )- #*, *, *, *

I'm wondering whether this is a flaw in my grammar or a flaw with Nearley/Moo itself. If it's a problem with my code, how can I fix it?


Solution

  • Since I didn't receive an answer from here I went ahead and asked in the Nearley GitHub repo.

    According to the maintainers, nearley-unparse can't currently generate strings to match a regular expression. There also aren't any plans to add that functionality as it would be a project in and of itself.

    Here is their full response:

    Hello there! Thanks for trying to post a StackOverflow question first, I’m sorry there wasn’t anyone able to help there :-)

    This is a limitation of the unparser: it doesn’t know how to generate random strings satisfying a regexp, nor are we planning to do so (that would be a project in itself!).

    Your grammar looks fine to me, at a brief glance; if you test it with nearley-test, hopefully you’ll find you get the parse trees you expect.