I'm using parsimonious to do some parsing and I'm having trouble figuring out how to properly parse alternatives that share the first character in an unordered away:
For example:
Text:
2 > 3
2 >= 3
Grammar:
expr = ~"[0-9]+" space operator space ~"[0-9]+"
operator = ">" / "==" / "<" / ">=" / "<="
space = ~"[\\s]*"
The first line of the text will parse correctly, but the second line won't. It seems like it matches ">" then gets stuck since it sees a "=". It never matches ">=" as a whole. How do I make it do that without having to specify these in careful order? I tried using "&" for lookahead matching but that doesn't seem to work.
parsimonious is based on PEGs. One of the distinguished properties of PEGs is that alternative operator is ordered, i.e. alternative choices are always tried from left to right and the first successful match wins. Thus, PEG grammars are never ambiguous but you must be aware of this property when writing your grammar and order alternatives accordingly. PEGs are actually a specifications of recursive descent parsers.
In your case you should really reorder matches in operator
production so that >=
is tried first. The other solution would be to prevent >
match to be successful if followed by =
. This is achieved using syntactic predicate Not
. In parsimonious it is denoted with !
, so this should work also:
operator = ">" !"=" / "==" / "<" / ">=" / "<="
This applies generally to all PEG parsers. It is not parsimonious specific.