pythonparser-combinatorscallbynameevaluation-strategy

Non-strict by-name arguments in Python?


Question

Is there any way to declare function arguments as non-strict (passed by-name)?

If this is not possible directly: are there any helper functions or decorators that help me achieve something similar?


Concrete example

Here is a littly toy-example to experiment with.

Suppose that I want to build a tiny parser-combinator library that can cope with the following classic grammar for arithmetic expressions with parentheses (numbers replaced by a single literal value 1 for simplicity):

num    = "1"

factor = num 
       | "(" + expr + ")"

term   = factor + "*" + term 
       | factor

expr   = term + "+" + expr
       | term

Suppose that I define a parser combinator as an object that has a method parse that can take list of tokens, current position, and either throw a parse error, or return a result and a new position. I can nicely define a ParserCombinator base class that provides + (concatenation) and | (alternative). Then I can define parser combinators that accept constant strings, and implement + and |:

# Two kinds of errors that can be thrown by a parser combinator
class UnexpectedEndOfInput(Exception): pass
class ParseError(Exception): pass

# Base class that provides methods for `+` and `|` syntax
class ParserCombinator:
  def __add__(self, next):
    return AddCombinator(self, next)
  def __or__(self, other):
    return OrCombinator(self, other)

# Literally taken string constants
class Lit(ParserCombinator):
  def __init__(self, string):
    self.string = string

  def parse(self, tokens, pos):
    if pos < len(tokens):
      t = tokens[pos]
      if t == self.string:
        return t, (pos + 1)
      else:
        raise ParseError
    else:
      raise UnexpectedEndOfInput

def lit(str): 
  return Lit(str)

# Concatenation
class AddCombinator(ParserCombinator):
  def __init__(self, first, second):
    self.first = first
    self.second = second
  def parse(self, tokens, pos):
    x, p1 = self.first.parse(tokens, pos)
    y, p2 = self.second.parse(tokens, p1)
    return (x, y), p2

# Alternative
class OrCombinator(ParserCombinator):
  def __init__(self, first, second):
    self.first = first
    self.second = second
  def parse(self, tokens, pos):
    try:
      return self.first.parse(tokens, pos)
    except:
      return self.second.parse(tokens, pos)

So far, everything is fine. However, because the non-terminal symbols of the grammar are defined in a mutually recursive fashion, and I cannot eagerly unfold the tree of all possible parser combinations, I have to work with factories of parser combinators, and wrap them into something like this:

# Wrapper that prevents immediate stack overflow
class LazyParserCombinator(ParserCombinator):
  def __init__(self, parserFactory):
    self.parserFactory = parserFactory
  def parse(self, tokens, pos):
    return self.parserFactory().parse(tokens, pos)

def p(parserFactory):
  return LazyParserCombinator(parserFactory)

This indeed allows me to write down the grammar in a way that is very close to the EBNF:

num    = p(lambda: lit("1"))
factor = p(lambda: num | (lit("(") + expr + lit(")")))
term   = p(lambda: (factor + lit("*") + term) | factor)
expr   = p(lambda: (term + lit("+") + expr) | term)

And it actually works:

tokens = [str(x) for x in "1+(1+1)*(1+1+1)+1*(1+1)"]
print(expr.parse(tokens, 0))

However, the p(lambda: ...) in every line is a bit annoying. Is there some idiomatic way to get rid of it? It would be nice if one could somehow pass the whole RHS of a rule "by-name", without triggering the eager evaluation of the infinite mutual recursion.


What I've tried

I've checked what's available in the core language: it seems that only if, and and or can "short-circuit", please correct me if I'm wrong.

I've tried looking at how other non-toy-example libraries do this.

I've also found something like the lazy_object_proxy.Proxy, but it didn't seem to help to instantiate such objects in more concise way.

So, is there a nicer way to pass arguments by-name and avoid the blowup of mutually recursively defined values?


Solution

  • It's a nice idea, but it's not something that Python's syntax allows for: Python expressions are always evaluated strictly (with the exception of if blocks and and and or short-circuiting expressions).

    In particular, the problem is that in an expression like:

    num = p(lit("1"))
    

    The p function argument is always received with a new name binding to the same object. The object resulting from evaluating lit("1") is not named anything (until a name is created by the formal parameter to p), so there is no name there to bind to. Conversely, there must be an object there, or otherwise p wouldn't be able to receive a value at all.

    What you could do is add a new object to use instead of a lambda to defer evaluation of a name. For example, something like:

    class DeferredNamespace(object):
        def __init__(self, namespace):
            self.__namespace = namespace
        def __getattr__(self, name):
            return DeferredLookup(self.__namespace, name)
    
    class DeferredLookup(object):
        def __init__(self, namespace, name):
            self.__namespace = namespace
            self.__name = name
        def __getattr__(self, name):
            return getattr(getattr(self.__namespace, self.__name), name)
    
    d = DeferredNamespace(locals())
    
    num = p(d.lit("1"))
    

    In this case, d.lit actually doesn't return lit, it returns a DeferredLookup object that will use getattr(locals(), 'lit') to resolve its members when they are actually used. Note that this captures locals() eagerly, which you might not want; you can adapt that to use a lambda, or better yet just create all your entities in some other namespace anyway.

    You still get the wart of the d. in the syntax, which may or may not be a deal-breaker, depending on your goals with this API.