pythonparsingtatsu

Tatsu no stopping at end of text


I trying to add some simple error handling to a DSL grammar in Tatsu. I wrote a simple grammar that parses input into either numbers or errors.

@@grammar :: Nums

start = wordlist $ ;

wordlist = {word eol}+ ;

eol = ':' ;

word
  = 
  | num:num     # Number.
  | err:err     # Not a number, error.
  | eol:eol     # Blank line.
  ;

num
  =
  | sci:sci      # Scientific 'e' notation
  | float:float  # Normal real number notation
  | int:int      # Integer
  ;


int = /[-+]?\d+\.?/ ;
float = /[-+]?\d*\.\d+/ ;
sci = /([+-]?\d*\.?\d+[Ee][+-]?\d+)|(^[+-]?\d+\.?\d*[Ee][+-]?\d+)/ ;

err = ->&eol ;

The input looks like:

123 : 2.3 : -1.: Error : -.0123 : -2.1e-2 : +1.2e+3 :

With tracing on I can see that it is correctly parsing all of the input. When it gets to the end, it seems to continue parsing in an infinite loop. From the last number to the start of the looping, the output looks like:

↙word↙wordlist↙start ~1:44
 +1.2e+3 :                                                                                                          
↙num↙word↙wordlist↙start ~1:45
+1.2e+3 :                                                                                                           
↙sci↙num↙word↙wordlist↙start ~1:45
+1.2e+3 :                                                                                                           
≡'+1.2e+3' /([+-]?\d*\.?\d+[Ee][+-]?\d+)|(^[+-]?\d+\.?\d*[Ee][+-]?\d+)/
 :                                                                                                                  
≡sci↙num↙word↙wordlist↙start ~1:52
 :                                                                                                                  
≡num↙word↙wordlist↙start ~1:52
 :                                                                                                                  
≡word↙wordlist↙start ~1:52
 :                                                                                                                  
↙eol↙wordlist↙start ~1:52
 :                                                                                                                  
≡':' 
≡eol↙wordlist↙start ~1:54
↙word↙wordlist↙start ~1:54
↙num↙word↙wordlist↙start 
↙sci↙num↙word↙wordlist↙start 
≢'' /([+-]?\d*\.?\d+[Ee][+-]?\d+)|(^[+-]?\d+\.?\d*[Ee][+-]?\d+)/
↙float↙num↙word↙wordlist↙start 
≢'' /[-+]?\d*\.\d+/
↙int↙num↙word↙wordlist↙start 
≢'' /[-+]?\d+\.?/
≢num↙word↙wordlist↙start 
↙err↙word↙wordlist↙start 
↙eol↙err↙word↙wordlist↙start 
≢':' 
≢eol↙err↙word↙wordlist↙start 
↙eol↙err↙word↙wordlist↙start 
≢eol↙err↙word↙wordlist↙start 
   :
   :
   :

For the life of me, I can't figure out why it doesn't stop. I'm not even sure what it is trying to parse. Can anyone help?

Thanks!


Solution

  • This is not a TatSu issue. The grammar given doesn't parse the desired language. You have eol all over the place.

    You can try something like this:

    wordlist = (eol).{word} $;
    
    eol = ':' ;
    
    word
      = 
      | num:num     # Number.
      | err:err     # Not a number, error.
      ;
    
    err = ->&(eol|$) ;
    

    A clearer version might be:

    wordlist = ':'.{word} $;
    
    word
      = 
      | num:num     # Number.
      | err:err     # Not a number, error.
      ;
    
    err = ->&(':'|$) ;
    

    By the way, very nice use of ->& for a recovery rule!