clojureebnfinstaparse

Instaparse series of numbers or letters as one leaf?


So I've been messing around with instaparse and it's been great, however I've been trying to avoid using Regexes as a crutch and it has resulted in a bit more verbose. For the sake of keeping this readable let's just say #'[A-z]' is actually in the 'A'|'B'|etc format.

(def myprsr (instaparse.core/parser 
  "word = (ltr | num)+; 
   <ltr> = #'[A-z]';
   <num> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';"))
(myprs"foo123") ;; -> [:word "f" "o" "o" "1" "2" "3"]

Is there any way without resorting to #'[A-z]+' and #'[0-9]+ to get leaves out like [:word "foo123"] or [:number "123"] (if I had made a number toplevel rule) in order to avoid having to concatenate them as part of the post parse processing?


Solution

  • There's currently no way (besides regexes) to automatically merge those strings during the parse. I would recommend doing this concatenation in the insta/transform map.

    There's also nothing wrong with using regexes in a case this simple. We know there isn't a possible parse we're missing out on by greedily parsing all the letters or all the numbers. Therefore regexes are acceptable (and more performant).