Given are the following lark grammar and Python source code:
start: (TEXT _NEWLINE)+
TEXT: /[^\n]+/
COMMENT: /\/\/[^\n]*/ _NEWLINE
%ignore COMMENT
_NEWLINE: (" "* "\n")+
from lark import Lark
parser = Lark.open("grammar.lark", parser='lalr')
parser.parse("""Lorem ipsum
// line comment
Text with // trailing comment
""")
The above parser produces this tree:
The first line of text is parsed correctly and the second line (which is a comment) is ignored as was intended. However, the last line contains the comment that is supposed to be ignored.
I realize that it is perfectly legal in my grammar to have two consecutive slashes in a TEXT
node (which should actually introduce a line comment). However, I do not know how to prevent this. Is there any way I can disallow two consecutive slashes in TEXT
or give higher priority to the COMMENT
terminal?
I just found a grammar that seems to work:
start: (TEXT _NEWLINE)+
TEXT: /(\/?[^\n\/])+/
COMMENT: /\/\/[^\n]*/
%ignore COMMENT
_NEWLINE: (" "* COMMENT? "\n")+
I doubt this is the most elegant solution, so I'd appreciate another answer or suggestions.