clojureinstaparse

Any way to speed up instaparse?


I'm trying to use instaparse on a dimacs file less than 700k in size, with the following grammar

<file>=<comment*> <problem?> clause+
comment=#'c.*'
problem=#'p\s+cnf\s+\d+\s+\d+\s*'
clause=literal* <'0'>
<literal>=#'[1-9]\d*'|#'-\d+'

calling like so

(def parser
  (insta/parser (clojure.java.io/resource "dimacs.bnf") :auto-whitespace :standard))
...
(time (parser (slurp filename)))

and it's taking about a hundred seconds. That's three orders of magnitude slower than I was hoping for. Is there some way to speed it up, some way to tweak the grammar or some option I'm missing?


Solution

  • The grammar is wrong. It can't be satisfied.

    Conclusion: No clause will ever be found.

    For example ...

    => (parser "60")
    Parse error at line 1, column 3:
    60
      ^
    Expected one of:
    "0"
    #"\s+"
    #"-\d+"
    #"[1-9]\d*"
    

    We can parse a literal

    => (parser "60" :start :literal)
    ("60")
    

    ... but not a clause

    => (parser "60" :start :clause)
    Parse error at line 1, column 3:
    60
      ^
    Expected one of:
    "0" (followed by end-of-string)
    #"\s+"
    #"-\d+"
    #"[1-9]\d*"
    

    Why is it so slow?

    If there is a comment:

    This implies that every tail has to be presented to the rest of the grammar, which includes a reg-exp for literal that Instaparse can't see inside. Hence all have to be tried, and all will ultimately fail. No wonder it's slow.


    I suspect that this file is actually divided into lines. And that your problems arise from trying to conflate newlines with other forms of white-space.

    May I gently point out that playing with a few tiny examples - which is all I've done - might have saved you a deal of trouble.