pythonpython-3.xparsingpypeg

pyPEG2 giving wrong result


I have created grammar with pyPEG2 for parsing such statements as:

A loves B but B hates A, A hates B and A loves D while B loves C

Here is my code below:

import pypeg2 as pp


class Person(str):
    grammar = pp.word

class Action(pp.Keyword):
    grammar = pp.Enum(pp.K('loves'), pp.K('hates'))

class Separator(pp.Keyword):
    grammar = pp.Enum(pp.K(','), pp.K('\n'), pp.K('but'), pp.K('and'), pp.K('while'))

relation = Person, Action, Person

class Relations(pp.Namespace):
    grammar = relation, pp.maybe_some(Separator, relation)

However when I try to do following:

>>> love = pp.parse('A loves B but B hates A , B loves C, Relations)

I get:

Traceback (most recent call last):
  File "<pyshell#64>", line 1, in <module>
    love = pp.parse('A loves B but B hates A , B loves C', Relations)
  File "/home/michael/.local/lib/python3.5/site-packages/pypeg2/__init__.py", line 669, in parse
    raise parser.last_error
  File "<string>", line 1
    es B but B hates A , B loves C
                       ^
SyntaxError: expecting Separator
>>> 

If I change statement for this one:

>>> love = pp.parse('A loves B but B hates A and B loves C', Relations)

There is no error, but last block is missed for some reasons:

>>> pp.compose(love)
'A loves B but B hates A'

So what am I doing wrong way, documentation is well described, but can`t really find what the mistake I did there.

Hope somebody can help with this. Thanks in advance!!!


Solution

  • There are two questions here.

    The Grammar you have for Separator uses the Keyword class. This matches a default regex of "\w" - word type characters. (https://fdik.org/pyPEG/grammar_elements.html#keyword)

    You'll need to import re, and define your own regex for that class. This regex should be the additional characters you wish to allow into a keyword, OR the at least one word type.

    import re
    
    class Separator(pp.Keyword):
        grammar = pp.Enum(pp.K(','), pp.K('\n'), pp.K('but'), pp.K('and'), pp.K('while'))
        regex = re.compile('[,]|\w+')
    

    This should work.

    Note - I'm also not sure that having the newline as a separator will work - you may need to dig to see about multiline parsing in a single Grammar in pypeg2.

    For the other part, I think this has something to do with using a namespace for the Relations type.

    >>> love
    Relations([(Symbol('#2024226558144'), 'A'), (Symbol('loves'), 
      Action('loves')), (Symbol('#2024226558384'), 'B'), (Symbol('but'),
      Separator('but')), (Symbol('#2024226558624'), 'B'), (Symbol('hates'),
      Action('hates')), (Symbol('#2024226558864'), 'A'), (Symbol('and'), 
      Separator('and')), (Symbol('#2024226559104'), 'B'),
      (Symbol('#2024226559344'), 'C'), ])
    

    If you make it's type list, it makes somewhat more sense - since Namespaces are supposed to have only named things, and not really sure what it means to have multiple definitions for a namespaced item.

    class Relations(pp.Namespace):
      grammar = relation, pp.maybe_some(Separator, relation)
    
    >>> love = pp.parse('A loves B but B hates A and B loves C', Relations)
    >>> love
    ['A', Action('loves'), 'B', Separator('but'), 'B', Action('hates'), 'A', Separator('and'), 'B', Action('loves'), 'C']
    >>> pp.compose(love)
    'A loves B but B hates A and B loves C'