I'm trying to use instaparse on a dimacs file less than 700k in size, with the following grammar
<file>=<comment*> <problem?> clause+
comment=#'c.*'
problem=#'p\s+cnf\s+\d+\s+\d+\s*'
clause=literal* <'0'>
<literal>=#'[1-9]\d*'|#'-\d+'
calling like so
(def parser
(insta/parser (clojure.java.io/resource "dimacs.bnf") :auto-whitespace :standard))
...
(time (parser (slurp filename)))
and it's taking about a hundred seconds. That's three orders of magnitude slower than I was hoping for. Is there some way to speed it up, some way to tweak the grammar or some option I'm missing?
The grammar is wrong. It can't be satisfied.
file
ends with a clause
.clause
ends with a '0'
.literal
in the clause
, being a greedy reg-exp,will eat
the final '0'
. Conclusion: No clause
will ever be found.
For example ...
=> (parser "60")
Parse error at line 1, column 3:
60
^
Expected one of:
"0"
#"\s+"
#"-\d+"
#"[1-9]\d*"
We can parse a literal
=> (parser "60" :start :literal)
("60")
... but not a clause
=> (parser "60" :start :clause)
Parse error at line 1, column 3:
60
^
Expected one of:
"0" (followed by end-of-string)
#"\s+"
#"-\d+"
#"[1-9]\d*"
Why is it so slow?
If there is a comment
:
'c'
character into successive comment
s;'c'
.This implies that every tail has to be presented to the rest of the grammar, which includes a reg-exp for literal
that Instaparse can't see inside. Hence all have to be tried, and all will ultimately fail. No wonder it's slow.
I suspect that this file is actually divided into lines. And that your problems arise from trying to conflate newlines with other forms of white-space.
May I gently point out that playing with a few tiny examples - which is all I've done - might have saved you a deal of trouble.