Following up from an earlier question, I'm a bit confused about the precedence of the /.+/
regex line; I would expect the below test to produce
line
line x
chunk abc
instead I get:
line
line x
line abc
def test_tokenizing(self):
p = Lark(r"""
_NL: /\n/
line.-1: /.+/? _NL
chunk: /abc/ _NL
start: (line|chunk)+
""", parser='lalr')
text = '\nx\nabc\n'
print(p.parse(text).pretty())
In Lark, priorities mean different things for rules and for terminals.
Just a quick reminder, rules have lowercase names, while terminals have UPPERCASE names.
In LALR mode, priorities on rules only affect which one is chosen in case of a reduce/reduce collision. It has no effect on the terminals inside it.
What you want is to change the priority on the terminal itself:
def test_tokenizing():
p = Lark(r"""
_NL: /\n/
line: EVERYTHING? _NL
EVERYTHING.-1: /.+/
chunk: /abc/ _NL
start: (line|chunk)+
""", parser='lalr')
text = '\nx\nabc\n'
print(p.parse(text).pretty())