strong text NOTE: given the answer, my way of using predicate was wrong. Hence, the title is misleading.
I'm writing a parser in python for an ancient language. The language contains syntax such as
want <decimal>
need <hex>
It is ambiguous on whether to resolve digits into dec or hex. Given the context of "want" and "need", I can predict the integer value format. Of course, the language is a bit more involved than this example. I started .g4 file using the predicate in java language since I developed the .g4 file and validated the correctness using antlr4-parse. However, after I ported the working .g4 file into python predicate, I got an parse error.
test.g4
grammar test;
@parser::members {
def setExpectDec(self, value):
self.context.expectDec = value
def setExpectHex(self, value):
self.context.expectHex = value
}
testFile: (command)* EOF;
command:
want
| need;
want:
'want' length;
length:
{self.setExpectDec(True)} numberExpr {self.setExpectDec(False)};
need:
'need' size;
size:
{self.setExpectHex(True)} numberExpr {self.setExpectHex(False)};
numberExpr:
{self.context.expectHex}? NUMBER_HEX
| {self.context.expectDec}? NUMBER_DEC;
NUMBER_HEX:
[0-9a-f]+;
NUMBER_DEC:
[0-9]+;
NEWLINE:
'\r'? '\n' -> skip;
WS:
[ \t]+ -> skip; // skip spaces and tabs
I ran this command to generate .py code:
antlr4 -Dlanguage=Python3 test.g4 -visitor -no-listener
test.py
import sys
from antlr4 import FileStream, InputStream, CommonTokenStream
from testLexer import testLexer
from testParser import testParser
class ParserContext:
def __init__(self):
# A flag to indicate if the next number should be interpreted as hexadecimal.
self.expectDec = False
# A flag to indicate if the next number should be interpreted as hexadecimal.
self.expectHex = False
class TestReader:
def parse_and_eval(self, testStream) -> None:
context = ParserContext()
lexer = testLexer(testStream)
lexer.context = context
tokenStream = CommonTokenStream(lexer)
parser = testParser(tokenStream)
parser.context = context
tree = parser.testFile()
def main():
if len(sys.argv) > 1:
testStream = FileStream(sys.argv[1])
else:
testStream = InputStream(sys.stdin.readline())
driver = None
testReader = TestReader()
testReader.parse_and_eval(testStream)
if __name__ == '__main__':
main()
input.txt
want 10
need a
Error:
line 1:5 no viable alternative at input '10'
Debug into the parser py script. I can see that the var values are set correctly in this function
<__main__.ParserContext object at 0x7fa8ff0023d0>
special variables
expectDec = True
expectHex = False
def numberExpr_sempred(self, localctx:NumberExprContext, predIndex:int):
if predIndex == 0:
return self.context.expectHex
if predIndex == 1:
return self.context.expectDec
Once I step out of this function, I got an exception shown below. It seems that the exception is thrown from some native code, not visible to py debugger. Should I expect numberExpr_sempred() to be called the second time with predIndex = 1 such that it returns with True? Unfortunately, I don't have a java development env setup to debug the java code. I see very similar java code structure.
Exception has occurred: NoViableAltException
None
File "sdm_atb_testParser.py", line 393, in numberExpr
la_ = self._interp.adaptivePredict(self._input,2,self._ctx)
File "sdm_atb_testParser.py", line 266, in length
self.numberExpr()
File "sdm_atb_testParser.py", line 225, in want
self.length()
File "sdm_atb_testParser.py", line 174, in command
self.want()
File "sdm_atb_testParser.py", line 120, in testFile
self.command()
File "sdm_atb_test.py", line 21, in parse_and_eval
tree = parser.testFile()
File "sdm_atb_test.py", line 32, in main
testReader.parse_and_eval(testStream)
File "sdm_atb_test.py", line 35, in <module>
main()
antlr4.error.Errors.NoViableAltException: None
I expect to be able to parse the input.txt successfully without syntax error.
When writing the rules:
NUMBER_HEX:
[0-9a-f]+;
NUMBER_DEC:
[0-9]+;
the lexer will never produce a NUMBER_DEC
token. ANTLR's lexer works like this:
Input like 1234
is matched by both NUMBER_HEX
and NUMBER_DEC
, and NUMBER_HEX
is defined first, so it will "win".
You'd need to place the rules like this:
NUMBER_DEC:
[0-9]+;
NUMBER_HEX:
[0-9a-f]+;
and then add a parser rule like this:
hex
: NUMBER_HEX
| NUMBER_DEC
;
and use it in your other parser rules:
testFile
: command* EOF
;
command
: want
| need
;
want
: 'want' NUMBER_DEC
;
need
: 'need' hex
;
hex
: NUMBER_HEX
| NUMBER_DEC
;
NUMBER_DEC : [0-9]+;
NUMBER_HEX : [0-9a-f]+;
NEWLINE : '\r'? '\n' -> skip;
WS : [ \t]+ -> skip;