I want to create (just for fun) a custom HTML parser based on LARK. I want it to be forgiving like the parsers in browsers, so I defined a catch-all terminal for strings. My only problem is that even though I have defined
STRING: /.+/
and then place that as an option for start
it errors out saying e.g. that a {
hasn't been closed when processing rule "STRING" or that an unterminated string literal has been encountered while processing rule "STRING".
That it always says that the error happened while processing rule "STRING" tells me that LARK realized it should interpret what it saw as a "STRING", but it somehow still errored out.
Putting the plus on the outside of the RegEx changes the behaviour in no way.
In case you wonder why I used {
as an example, that's just because I tested the catch-all string in another syntax, I haven't started the actual HTML parser project yet.
I just found the issue in my transformer.
The syntax correctly captured it as a "STRING", but then my transformer just eval()
'd it to interpret the escape sequences and remove the quotes. And because Python's eval()
follows Python's and not my syntax it threw an error. Handling the string differently in the transformer removes the error.
[EDIT]: I didn't find it there immediately because my parsing loop catches all errors and outputs only the message of the error to stdout
, so I didn't see any stack trace.