regexnearleymoo-lexer

Grammar - How to match optional and required whitespaces before and after words?


I am using nearley and moo to come up with a rather complex grammar. It seems to be working fine EXCEPT for my whitespace requirements. I need to require whitespace when needed and allow it when not while keeping the grammar unambiguous.

For example:

After dinner, I went to bed.

I need to require whitespace between the words but allow it around the comma. So the following are also valid:

After dinner , I went to bed.
After dinner,I went to bed.

Below is a quick nearley grammar trying to do this. If you don't get the syntax, it's pretty easy to figure it out.

// Required whitespace
rws : [ \t]+
// Optional whitespace
ows : [ \t]*

sentence -> words %ows "," sentence
          | words

words    -> word %rws words
         -> word

word     -> [a-zA-Z]

The grammar may have issues but the idea is the same. This becomes an ambiguous grammar. How can I define an unambiguous grammar, expecting optional and required whitespaces?


Solution

  • I'm not familiar with Nearly nor Moo but the regex could be

    whitespace : ([ \t]*,[ \t]*|[ \t])
    

    and your grammar would become

    word %whitespace word
    

    Hopefully that makes sense and I didn't completely botch up the language.