I'm not sure whether this a problem with the Nearley.js library, the Moo tokenizer/lexer or with my own code. So I might need to submit this as an issue to the Nearley repo. All the referenced files can be found in this Gist.
I am attempting to write a Nearley grammar that will parse a list of homework problems for one of my classes. The problems are in problems.txt and look like this:
Section 5.2 (Due 4/23)- #3, 5*, 8*, 9, 11, 14*, 15, 17*, 18*, 20, 21*, 22*, 24*, 25 (see example 5, not discussed in class)
Section 5.3 (Due 4/30)- #1, 3*, 4, 5, 6*, 7, 9*, 11, 13*, 16, 20*, 21*, 22*, 23, 24*, 25*, 27, 28*, 31, 32
That's just two lines as an example, whole file is larger.
The Nearley grammar I wrote is in problems-grammar.ne
here and I'm not entirely finished yet. I'm using the Moo tokenizer/lexer according to these instructions in the Nearley docs.
I'm currently testing my grammar by using the nearley-unparse
command as explained here using this command, where problems-grammar.js
is the parser compiled by Nearley.
nearley-unparse problems-grammar.js -o test.txt
Unfortunately, the unparser doesn't seem to be correctly generating grammars with examples of the tokens, apart from the newline token. Here is one output of nearley-unparse
:
Section (Due )- #*, ,
Section (Due )- #, *,
Section (Due )- #*, , , *,
Section (Due )- #*, *
Section (Due )- #*, *, *, *
I'm wondering whether this is a flaw in my grammar or a flaw with Nearley/Moo itself. If it's a problem with my code, how can I fix it?
Since I didn't receive an answer from here I went ahead and asked in the Nearley GitHub repo.
According to the maintainers, nearley-unparse
can't currently generate strings to match a regular expression. There also aren't any plans to add that functionality as it would be a project in and of itself.
Here is their full response:
Hello there! Thanks for trying to post a StackOverflow question first, I’m sorry there wasn’t anyone able to help there :-)
This is a limitation of the unparser: it doesn’t know how to generate random strings satisfying a regexp, nor are we planning to do so (that would be a project in itself!).
Your grammar looks fine to me, at a brief glance; if you test it with nearley-test, hopefully you’ll find you get the parse trees you expect.