javascriptparsinglexical-analysispeg

Ignore whitespace with PEG.js


I want to ignore whitespaces and new lines with my grammar so they are missing in the PEG.js output. Also, a literal within brackets should be returned in a new array.

Grammar

start
  = 'a'? sep+ ('cat'/'dog') sep* '(' sep* stmt_list sep* ')'

stmt_list
  = exp: [a-zA-Z]+ { return new Array(exp.join('')) }

sep
  = [' '\t\r\n]

Test case

a dog( Harry )

Output

[
   "a",
   [
      " "
   ],
   "dog",
   [],
   "(",
   [
      " "
   ],
   [
       "Harry"
   ],
   [
      " "
   ],
   ")"
]

Output I want

[
   "a",
   "dog",
   [
      "Harry"
   ]
]

Solution

  • You have to break up the grammar more, using more "non-terminals" (not sure if that's what you call them in a PEG):

    start
      = article? animal stmt_list
    
    article
      = article:'a' __ { return article; }
    
    animal
      = animal:('cat'/'dog') _ { return animal; }
    
    stmt_list
      = '(' _ exp:[a-zA-Z]+ _ ')' { return [ exp.join('') ]; }
    
    // optional whitespace
    _  = [ \t\r\n]*
    
    // mandatory whitespace
    __ = [ \t\r\n]+
    

    Thanks for asking this question!

    Edit: To increase readability, have two productions: _ and __