Imagine the following types of strings:
if ((a1 and b) or (a2 and c)) or (c and d) or (e and f)
Now, I'd like to get the expressions in parentheses, so I wrote a PEG
parser with the following grammar:
from parsimonious.grammar import Grammar
grammar = Grammar(
r"""
program = if expr+
expr = term (operator term)*
term = (factor operator factor) / factor
factor = (lpar word operator word rpar) / (lpar expr rpar)
if = "if" ws
and = "and"
or = "or"
operator = ws? (and / or) ws?
word = ~"\w+"
lpar = "("
rpar = ")"
ws = ~"\s*"
""")
which parses just fine with
tree = grammar.parse(string)
Now the question arises: how to write a NodeVisitor
class for this tree to get only the factors? My problem here is the second branch which can be deeply nested.
def walk(node, level = 0):
if node.expr.name == "factor":
print(level * "-", node.text)
for child in node.children:
walk(child, level + 1)
walk(tree)
but to no avail, really (factors bubble up in duplicates).
Note: This question is based on another one on StackOverflow.
How would I go about it to get ((a1 and b) or (a2 and c)), (c and d) and (e and f) as three parts?
You could create a visitor that "listens" when a node in the parse tree is a (
, in which a depth-variable is increased, and when a )
is encountered, the depth-variable is decreased. Then in the method that is called that matches a parenthesised expression, you inspect the depth before adding it to your list of expressions to return from the visitor.
Here a is a quick example:
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
grammar = Grammar(
r"""
program = if expr+
expr = term (operator term)*
term = (lpar expr rpar) / word
if = "if" ws
and = "and"
or = "or"
operator = ws? (and / or) ws?
word = ~"\w+"
lpar = "("
rpar = ")"
ws = ~"\s*"
""")
class ParExprVisitor(NodeVisitor):
def __init__(self):
self.depth = 0
self.par_expr = []
def visit_term(self, node, visited_children):
if self.depth == 0:
self.par_expr.append(node.text)
def visit_lpar(self, node, visited_children):
self.depth += 1
def visit_rpar(self, node, visited_children):
self.depth -= 1
def generic_visit(self, node, visited_children):
return self.par_expr
tree = grammar.parse("if ((a1 and b) or (a2 and c)) or (c and d) or (e and f)")
visitor = ParExprVisitor()
for expr in visitor.visit(tree):
print(expr)
which prints:
((a1 and b) or (a2 and c))
(c and d)
(e and f)