So I've been messing around with instaparse and it's been great, however I've been trying to avoid using Regexes as a crutch and it has resulted in a bit more verbose. For the sake of keeping this readable let's just say #'[A-z]'
is actually in the 'A'|'B'|etc
format.
(def myprsr (instaparse.core/parser
"word = (ltr | num)+;
<ltr> = #'[A-z]';
<num> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';"))
(myprs"foo123") ;; -> [:word "f" "o" "o" "1" "2" "3"]
Is there any way without resorting to #'[A-z]+'
and #'[0-9]+
to get leaves out like [:word "foo123"]
or [:number "123"]
(if I had made a number toplevel rule) in order to avoid having to concatenate them as part of the post parse processing?
There's currently no way (besides regexes) to automatically merge those strings during the parse. I would recommend doing this concatenation in the insta/transform
map.
There's also nothing wrong with using regexes in a case this simple. We know there isn't a possible parse we're missing out on by greedily parsing all the letters or all the numbers. Therefore regexes are acceptable (and more performant).