pythonsly

When can you use a name before it's defined?


In SLY there is an example for writing a calculator (reproduced from calc.py here):

from sly import Lexer

class CalcLexer(Lexer):
    tokens = { NAME, NUMBER }
    ignore = ' \t'
    literals = { '=', '+', '-', '*', '/', '(', ')' }

    # Tokens
    NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'

    @_(r'\d+')
    def NUMBER(self, t):
        t.value = int(t.value)
        return t

    @_(r'\n+')
    def newline(self, t):
        self.lineno += t.value.count('\n')

    def error(self, t):
        print("Illegal character '%s'" % t.value[0])
        self.index += 1

It looks like it's bugged because NAME and NUMBER are used before they've been defined. But actually, there is no NameError, and this code executes fine. How does that work? When can you reference a name before it's been defined?


Solution

  • Python knows four kinds of direct name lookup: builtins / program global, module global, function/closure body, and class body. The NAME, NUMBER are resolved in a class body, and as such subject to the rules of this kind of scope.

    The class body is evaluated in a namespace provided by the metaclass, which can implement arbitrary semantics for name lookups. In specific, the sly Lexer is a LexerMeta class using a LexerMetaDict as the namespace; this namespace creates new tokens for undefined names.

    class LexerMetaDict(dict):
        ...
        def __getitem__(self, key):
            if key not in self and key.split('ignore_')[-1].isupper() and key[:1] != '_':
                return TokenStr(key, key, self.remap)
            else:
                return super().__getitem__(key)
    

    The LexerMeta is also responsible for adding the _ function to the namespace so that it can be used without imports.

    class LexerMeta(type):
        '''
        Metaclass for collecting lexing rules
        '''
        @classmethod
        def __prepare__(meta, name, bases):
            d = LexerMetaDict()
    
            def _(pattern, *extra):
                ...
    
            d['_'] = _
            d['before'] = _Before
            return d