smalltalkpetitparser

How to define Pascal variables in PetitParser


Here is the (simplified) EBNF section I'm trying to implement in PetitParser:

variable :: component / identifier
component :: indexed / field
indexed :: variable , $[ , blah , $]
field :: variable , $. , identifier

What I did was to add all these productions (except identifier) as ivars of my subclass of PPCompositeParser and define the corresponding methods as follows:

variable
  ^component / self identifier

component
  ^indexed / field

identifier
  ^(#letter asParser, (#word asParser) star) flatten

indexed
  ^variable , $[ asParser, #digit asParser, $] asParser

field
  ^variable , $. asParser, self identifier

start
  ^variable

Finally, I created a new instance of my parser and sent to it the message parse: 'a.b[0]'.

The problem: I get a stack overflow.


Solution

  • The problem is that your grammar is left recursive. PetitParser uses a top-down greedy algorithm to parse the input string. If you follow the steps, you'll see that it goes from start then variable -> component -> indexed -> variable. This is becomes a loop that gets executed infinitely without consuming any input, and is the reason of the stack overflow (that is the left-recursiveness in practice).

    The trick to solve the situation is to rewrite the parser by adding intermediate steps to avoid left-recursing. The basic idea is that the rewritten version will consume at least one character in each cycle. Let's start by simplifying a bit the parser refactoring the non-recursive parts of ´indexed´ and ´field´, and moving them to the bottom.

    variable
      ^component, self identifier
    
    component
      ^indexed / field
    
    indexed
      ^variable, subscript
    
    field
      ^variable, fieldName
    
    start
      ^variable
    
    
    subscript
        ^$[ asParser, #digit asParser, $] asParser
    
    fieldName
        ^$. asParser, self identifier
    
    identifier
      ^(#letter asParser, (#word asParser) star) flatten
    

    Now you can more easily see (by following the loop) that if the recursion in variable is to end, an identifier has to be found at the beginning. That's the only way to start, and then comes more input (or ends). Let's call that second part variable':

    variable
        ^self identifier, variable'
    

    now the variable' actually refers to something with the identifier consumed, and we can safely move the recusion from the left of indexed and field to the right in variable':

    variable'
        component', variable' / nil asParser
    
    component'
        ^indexed' / field'
    
    indexed'
        ^subscript
    
    field'
        ^fieldName
    

    I've written this answer without actually testing the code, but should be okish. The parser can be further simplified, I leave that as an excercise ;).

    For more information on left-recursion elimination you can have a look at left recursion elimination