I seem to be completely stuck with understanding why this is failing to parse. Following is my simple grammar (just playing around trying to understand parsimonious and hence the grammar may not make sense).
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
sql_grammar = Grammar(
"""
select_statement = "SELECT" ("ALL" / "DISTINCT")? object_alias_section
object_alias_section = object_name / alias
object_name = ~"[ 0-9]*"
alias = ~"[ A-Z]*"
"""
)
data = """SELECT A"""
tree = sql_grammar.parse(data)
print("tree:", tree, "\n")
A SELECT 10
parses but for some reason, a SELECT A
fails to parse. My understanding is either of object_name
or alias
should be present. What am i doing wrong? Thanks in advance.
There are two problems with your grammer:
Parsimonious doesn't handle whitespace automaticaly, you must take care of them (some ideas can be derived from https://github.com/erikrose/parsimonious/blob/master/parsimonious/grammar.py#L224)
As stated in README.md /
operator match the first matching alternatives, so it try to match object_name
first. Because there is hanging unparsed space, it is match by object_name
and parsing finish. But even if the space would be correctly handled, object_name
would match empty string and parsing also would finish with error.
To fix you grammar, I suggest change it as follow:
sql_grammar = Grammar(
"""
select_statement = "SELECT" (ws ("ALL" / "DISTINCT"))? ws object_alias_section
object_alias_section = object_name / alias
object_name = ~"[ 0-9]+"
alias = ~"[ A-Z]+"
ws = ~"\s+"
"""
)
and everything should parse correctly.