pythonlexply

PLY Lex: ID could be anything


I have a following simple format:

BLOCK ID {
    SUBBLOCK ID {
        SUBSUBBLOCK ID {
            SOME STATEMENTS;
        };
    };
};

I configured ply to work with this format. But the issue is that ID could be any string including "BLOCK", "SUBBLOCK", etc. In the lexer I define ID as:

@TOKEN(r'[a-zA-Z_][a-zA-Z_0-9]*')
def t_ID(self, t):
    t.type = self.keyword_map.get(t.value, "ID")
    return t

But it means that BLOCK word will not be allowed as a block name.

How I can overcome this issue?


Solution

  • The easiest solution is to create a non-terminal name to be used instead of ID in productions which need a name, such as block : BLOCK name braced_statements:

        # Docstring is added later
        def p_name(self, p):
            p[0] = p[1]
    

    Then you compute the productions for name and assign them to p_name's docstring by executing this before you generate the parser:

    Parser.p_name.__doc__ = '\n| '.join(
             ['name : ID']
             + list(Lexer.keyword_map.values())
    )