yacclexply

PLY conditional rule


How can I handle one to many rules in PLY.

tokens = (
    'VAR1',
    'VAR2',
    'TRUE',
    'SINGLE_CHAR',
)

def t_VAR1(t):
    r'var1'
    return t

def t_VAR2(t):
    r'var2'
    return t

def t_TRUE(t):
    r't|T'
    return t

def t_SINGLE_CHAR(t):
    r'[A-Za-z]'
    return t

Now if I have two rules.

def p_expression_variable2(p):
    '''variable2 : VAR2 SINGLE_CHAR'''
    print("got VAR2")

def p_expression_variable1(p):
    '''variable1 : VAR1 TRUE'''
    print("got VAR1")

I want to make the rules so that var1 T matches with p_expression_variable1 and var2 T matches with p_expression_variable2. But as t_TRUE is a subset of t_SINGLE_CHAR, T always matches with t_TRUE, hence giving error.

I am novice in lex and yacc, want to know how to handle rules for such issue. I know that It can be handled by conditional lexing (defining two states), but is there a way to handle it with single state?


Solution

  • Not in the way you want to do it.

    In the yacc/lex parsing model lexical analysis is performed without regard to the parser state. That's the point of separating syntactic and lexical analysis into distinct components. If you wanted them to interact in the way you suggest, you would have to use a scannerless parser, but that involves a certain cost, both in grammar complexity and in the parsing algorithm, because the resulting grammar is unlikely to work with only a single character lookahead.

    With the lex/yacc framework, you could use multiple lexical states, but that requires either feedback from the parser to the lexical analyser, generally using mid-rule actions, which Ply doesn't support, or reproducing part of the syntactic analysis inside a hand-built state machine in the lexical analyser. Both of these solutions are inelegant, unscalable, and unnecessarily bulky. (Nonetheless, they are more common than one might hope.)

    What you can do is to make the lexical analysis unambiguous:

    def t_TRUE(t):
        r'[tT]'
        return t
    
    def t_SINGLE_CHAR(t):
        r'[A-SU-Za-su-z]'
        return t
    

    and implement a kind of lexical fallback in the parser:

    def p_any_char(p):
        '''any_char : SINGLE_CHAR
                    | TRUE
        '''
        p[0] = p[1]
    
    def p_expression_variable2(p):
        '''variable2 : VAR2 any_char'''
        print("got VAR2")
    
    def p_expression_variable1(p):
        '''variable1 : VAR1 TRUE'''
        print("got VAR1")