Question
Is there any way to declare function arguments as non-strict (passed by-name)?
If this is not possible directly: are there any helper functions or decorators that help me achieve something similar?
Concrete example
Here is a littly toy-example to experiment with.
Suppose that I want to build a tiny parser-combinator library that can cope with the following classic grammar for arithmetic expressions with parentheses (numbers replaced by a single literal value 1
for simplicity):
num = "1"
factor = num
| "(" + expr + ")"
term = factor + "*" + term
| factor
expr = term + "+" + expr
| term
Suppose that I define a parser combinator as an object that has a method parse
that can take list of tokens, current position, and either throw a parse error, or return a result and a new position. I can nicely define a ParserCombinator
base class that provides +
(concatenation) and |
(alternative). Then I can define parser combinators that accept constant strings, and implement +
and |
:
# Two kinds of errors that can be thrown by a parser combinator
class UnexpectedEndOfInput(Exception): pass
class ParseError(Exception): pass
# Base class that provides methods for `+` and `|` syntax
class ParserCombinator:
def __add__(self, next):
return AddCombinator(self, next)
def __or__(self, other):
return OrCombinator(self, other)
# Literally taken string constants
class Lit(ParserCombinator):
def __init__(self, string):
self.string = string
def parse(self, tokens, pos):
if pos < len(tokens):
t = tokens[pos]
if t == self.string:
return t, (pos + 1)
else:
raise ParseError
else:
raise UnexpectedEndOfInput
def lit(str):
return Lit(str)
# Concatenation
class AddCombinator(ParserCombinator):
def __init__(self, first, second):
self.first = first
self.second = second
def parse(self, tokens, pos):
x, p1 = self.first.parse(tokens, pos)
y, p2 = self.second.parse(tokens, p1)
return (x, y), p2
# Alternative
class OrCombinator(ParserCombinator):
def __init__(self, first, second):
self.first = first
self.second = second
def parse(self, tokens, pos):
try:
return self.first.parse(tokens, pos)
except:
return self.second.parse(tokens, pos)
So far, everything is fine. However, because the non-terminal symbols of the grammar are defined in a mutually recursive fashion, and I cannot eagerly unfold the tree of all possible parser combinations, I have to work with factories of parser combinators, and wrap them into something like this:
# Wrapper that prevents immediate stack overflow
class LazyParserCombinator(ParserCombinator):
def __init__(self, parserFactory):
self.parserFactory = parserFactory
def parse(self, tokens, pos):
return self.parserFactory().parse(tokens, pos)
def p(parserFactory):
return LazyParserCombinator(parserFactory)
This indeed allows me to write down the grammar in a way that is very close to the EBNF:
num = p(lambda: lit("1"))
factor = p(lambda: num | (lit("(") + expr + lit(")")))
term = p(lambda: (factor + lit("*") + term) | factor)
expr = p(lambda: (term + lit("+") + expr) | term)
And it actually works:
tokens = [str(x) for x in "1+(1+1)*(1+1+1)+1*(1+1)"]
print(expr.parse(tokens, 0))
However, the p(lambda: ...)
in every line is a bit annoying. Is there some idiomatic way to get rid of it? It would be nice if one could somehow pass the whole RHS of a rule "by-name", without triggering the eager evaluation of the infinite mutual recursion.
What I've tried
I've checked what's available in the core language: it seems that only if
, and
and or
can "short-circuit", please correct me if I'm wrong.
I've tried looking at how other non-toy-example libraries do this.
For example,
funcparserlib
uses explicit forward declarations to avoid mutual recursion
(look at the forward_decl
and value.define
part in github README.md example code).
The parsec.py
uses some special @generate
decorators
and seems to do something like monadic parsing using coroutines.
That's all very nice, but my goal is to understand what options
I have with regards to the basic evaluation strategies available
in Python.
I've also found something like the lazy_object_proxy.Proxy
, but it didn't seem to help to instantiate such objects in more concise way.
So, is there a nicer way to pass arguments by-name and avoid the blowup of mutually recursively defined values?
It's a nice idea, but it's not something that Python's syntax allows for: Python expressions are always evaluated strictly (with the exception of if
blocks and and
and or
short-circuiting expressions).
In particular, the problem is that in an expression like:
num = p(lit("1"))
The p
function argument is always received with a new name binding to the same object. The object resulting from evaluating lit("1")
is not named anything (until a name is created by the formal parameter to p
), so there is no name there to bind to. Conversely, there must be an object there, or otherwise p
wouldn't be able to receive a value at all.
What you could do is add a new object to use instead of a lambda to defer evaluation of a name. For example, something like:
class DeferredNamespace(object):
def __init__(self, namespace):
self.__namespace = namespace
def __getattr__(self, name):
return DeferredLookup(self.__namespace, name)
class DeferredLookup(object):
def __init__(self, namespace, name):
self.__namespace = namespace
self.__name = name
def __getattr__(self, name):
return getattr(getattr(self.__namespace, self.__name), name)
d = DeferredNamespace(locals())
num = p(d.lit("1"))
In this case, d.lit
actually doesn't return lit
, it returns a DeferredLookup
object that will use getattr(locals(), 'lit')
to resolve its members when they are actually used. Note that this captures locals()
eagerly, which you might not want; you can adapt that to use a lambda, or better yet just create all your entities in some other namespace anyway.
You still get the wart of the d.
in the syntax, which may or may not be a deal-breaker, depending on your goals with this API.