I've implemented a grammar in Tatsu for parsing a description of a quantum program Quipper ASCII (link). The parser works but is slow for the files I'm looking at (about 10kB-1MB size, see the resources
directory). It takes approximately 10-30 seconds to parse some files. The grammar is very straightforward and it should be possible to parse these files fairly quickly. One thing I've tried is adding cuts wherever possible to make sure there is no unneeded backtracking. The grammar is specified as
@@grammar :: Quipper
@@whitespace :: /[^\S\n]*/
# The root of any program
start::BCircuit = circuit:circuit subroutines:{subroutine} {newline} $ ;
# A sequence of gates
circuit::Circuit =
"Inputs:" ~ inputs:arity
gatelist:{gate newline}
"Outputs: "outputs:arity
;
# "Function" definitions
subroutine::Subroutine =
newline
"Subroutine:" ~ name:string newline
"Shape:" shape:string newline
"Controllable:" controllable:("yes"|"no"|"classically") newline
circuit:circuit
;
# Wires and their types.
arity = @:",".{type_assignment}+ newline ;
type_assignment::TypeAssignment = number:int ":" type:("Qbit"|"Cbit") ;
# Gate control
control_app = [controlled:controlled] [no_control:no_control];
controlled::Controlled = "with controls=[" ~ controls:",".{int}+ "]" ;
no_control::NoControl = "with nocontrol" ;
# All gates
gate
=
| qgate
| qrot
| gphase
| cnot
| cgate
| cswap
| qprep
| qunprep
| qinit
| cinit
| cterm
| qmeas
| qdiscard
| cdiscard
| dterm
| subroutine_call
| comment
;
# Gate definitions
qgate::QGate::Gate = "QGate[" ~ name:string "]" inverse:["*"] "(" qubit:int ")" > control_app;
qrot::QRot::Gate = "QRot[" ~ string "," double "](" int ")" ;
gphase::GPhase::Gate = "Gphase() with t=" ~ timestep:double >control_app "with anchors=[" ~ wires:",".{wire} "]" ;
cnot::CNo::Gate = "CNot(" ~ wire:wire ")" >control_app ;
cgate::CGat::Gate = "CGate[" ~ name:string "]" inverse:["*"] "(" wires:",".{wire} ")" no_control;
cswap::CSwap::Gate = "CSwap(" ~ wire1:wire "," wire2:wire ")" >control_app ;
qprep::QPrep::Gate = "QPrep(" ~ wire:wire ")" no_control ;
qunprep::QUnprep::Gate = "QUnprep(" ~ wire:wire ")" no_control ;
qinit::QInit::Gate = state:("QInit0" | "QInit1") ~ "(" wire:wire ")" no_control;
cinit::CInit::Gate = state:("CInit0" | "CInit1") ~ "(" wire:wire ")" no_control;
qterm::QTerm::Gate = state:("QTerm0" | "QTerm1") ~ "(" wire:wire ")" no_control;
cterm::CTerm::Gate = state:("CTerm0" | "CTerm1") ~ "(" wire:wire ")" no_control;
qmeas::QMeas::Gate = "QMeas(" ~ wire:wire ")" ;
qdiscard::QDiscard::Gate = "QDiscard(" ~ wire:wire ")" ;
cdiscard::CDiscard::Gate = "CDiscard(" ~ wire:wire ")" ;
dterm::DTerm::Gate = state:("DTerm0" | "Dterm1") ~ "(" wire:wire ")" ;
subroutine_call::SubCall::Gate = "Subroutine" ~ ["(x" repetitions:int ")"]
"[" name:string ", shape" shape:string "]"
inverse:["*"]
"(" inputs:",".{int}+ ")"
"-> (" outputs:",".{int}+ ")"
>control_app ;
comment::Comment::Gate = "Comment[" ~ text:string "](" wires:",".{wire}+ ")" ;
# Reference to an input wire and a textual description
wire::Wire = qubit:int ":" text:string ;
# Literals
string = '"'@:String_literal'"'; # Don't include the quotes.
String_literal::str = /[^"]+/ ;
int::int = /([+|-])?\d+/ ;
double::float = /(-)?\d+\.\d+e(-)?\d+/ ;
newline = /\n/ ;
I've tried profiling the code to find bottlenecks in the performance, but time is spent all over the place. I generate a parser with tatsu grammar.ebnf
and model with tatsu -g
, which I then use in a test cases to parse input files. Performance results using standard python
3.6.4, sorted by tottime
, for parsing all files in resources/PF
:
Tue Feb 27 13:35:58 2018 parser_profiling
4787639497 function calls (4611051402 primitive calls) in 3255.157 seconds
Ordered by: internal time
List reduced from 326 to 30 due to restriction <30>
ncalls tottime percall cumtime percall filename:lineno(function)
144386670 129.008 0.000 491.328 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:245(_eat_regex)
312540554 92.592 0.000 216.890 0.000 {built-in method builtins.isinstance}
15701720/80 88.741 0.000 3249.970 40.625 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:561(_invoke_rule)
34327680 87.957 0.000 370.060 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/ast.py:14(__init__)
57815550 76.052 0.000 380.881 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:322(matchre)
95427920 75.086 0.000 149.629 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:208(goto)
67455280 74.390 0.000 124.299 0.000 /Users/eddie/dev/quippy/.venv/bin/../lib/python3.6/abc.py:178(__instancecheck__)
45115140 69.378 0.000 671.723 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:260(next_token)
57815550 67.317 0.000 143.516 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:329(_scanre)
32497610 63.583 0.000 69.483 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/ast.py:115(__hasattribute__)
15701720/80 63.262 0.000 3249.976 40.625 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:501(_call)
18626000 61.979 0.000 454.538 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:644(_try)
68655360 57.334 0.000 57.334 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/ast.py:103(__setattr__)
57815550 54.505 0.000 54.505 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
126339356 49.908 0.000 49.908 0.000 /Users/eddie/dev/quippy/.venv/bin/../lib/python3.6/_weakrefset.py:70(__contains__)
72092420 48.541 0.000 170.810 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:211(move)
33410950/4018330 47.143 0.000 334.748 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/objectmodel.py:149(_adopt_children)
48867260 46.582 0.000 724.214 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:237(_next_token)
37932280 44.424 0.000 91.042 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:277(_add_cst_node)
48128890 44.030 0.000 58.493 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:256(eat_eol_comments)
36470510 43.970 0.000 234.524 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/ast.py:48(update)
24351260 43.216 0.000 120.680 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/ast.py:60(set)
18626000 41.821 0.000 545.252 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:666(_option)
75390600 40.445 0.000 40.445 0.000 {built-in method builtins.getattr}
79996390 39.995 0.000 52.367 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:223(_pos)
246258630 39.345 0.000 39.345 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:155(pos)
38860150/25041530 38.587 0.000 118.596 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/objectmodel.py:108(__cn)
95427954 38.319 0.000 38.319 0.000 {built-in method builtins.min}
15701720/80 38.148 0.000 3249.974 40.625 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:536(_recursive_call)
32060560 37.741 0.000 45.157 0.000 /usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py:59(__init__)
Even the largest contributor (_eat_regex
) cannot explain the 50 minute run-time. I also cannot find how to optimize for speed in the tatsu documentation.
Some more insight may be gained by looking at the cumulative time spent in functions. I've sorted the profiling results by cumtime
:
Tue Feb 27 13:35:58 2018 parser_profiling
4787639497 function calls (4611051402 primitive calls) in 3255.157 seconds
Ordered by: cumulative time
List reduced from 326 to 30 due to restriction <30>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 3273.337 3273.337 {built-in method builtins.exec}
1 0.000 0.000 3273.337 3273.337 <string>:1(<module>)
1 0.005 0.005 3273.337 3273.337 test_model.py:95(test_optimizer)
80 0.002 0.000 3272.954 40.912 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:182(parse)
15701720/80 31.630 0.000 3249.977 40.625 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:47(wrapper)
15701720/80 63.262 0.000 3249.976 40.625 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:501(_call)
15701720/80 38.148 0.000 3249.974 40.625 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:536(_recursive_call)
15701720/80 88.741 0.000 3249.970 40.625 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:561(_invoke_rule)
80 0.001 0.000 3174.181 39.677 /Users/eddie/dev/quippy/_parser.py:82(_start_)
380/240 0.034 0.000 3169.206 13.205 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:762(_closure)
220 0.000 0.000 3164.452 14.384 /Users/eddie/dev/quippy/_parser.py:87(block2)
220 0.003 0.000 3087.977 14.036 /Users/eddie/dev/quippy/_parser.py:121(_subroutine_)
624800/800 15.209 0.000 3076.119 3.845 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:746(_repeat)
220 0.002 0.000 3024.411 13.747 /Users/eddie/dev/quippy/_parser.py:101(_circuit_)
2578040/1272030 7.935 0.000 2970.683 0.002 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:732(_isolate)
1875860 3.029 0.000 2839.993 0.002 /Users/eddie/dev/quippy/_parser.py:108(block2)
1875860 11.120 0.000 2483.056 0.001 /Users/eddie/dev/quippy/_parser.py:217(_gate_)
1875860 25.944 0.000 1793.816 0.001 /Users/eddie/dev/quippy/_parser.py:256(_qgate_)
48867260 46.582 0.000 724.214 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:237(_next_token)
45115140 69.378 0.000 671.723 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:260(next_token)
14443300 35.990 0.000 577.765 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:598(_invoke_semantic_rule)
49442230/22727050 22.427 0.000 575.361 0.000 {built-in method builtins.next}
18626000 41.821 0.000 545.252 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:666(_option)
48128890 19.695 0.000 497.263 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:249(eat_whitespace)
32060560/13779580 12.309 0.000 493.588 0.000 /usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py:79(__enter__)
144386670 129.008 0.000 491.328 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/buffering.py:245(_eat_regex)
14443300 34.625 0.000 460.941 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/semantics.py:76(_default)
18626000 61.979 0.000 454.538 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:644(_try)
17463860 36.024 0.000 415.958 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/contexts.py:606(_token)
4018330 18.630 0.000 410.825 0.000 /Users/eddie/dev/quippy/.venv/lib/python3.6/site-packages/tatsu/objectmodel.py:19(__init__)
We can see that quite some time is spent in choosing which gate is being performed (it is a large choice statement after all). A lot of time is also spent in qgate
which is not surprising since these nodes are dominant in the code.
What are possible optimizations I can add to the grammar or parsing so that it can parse these files more quickly?
The problem may be the @@whitespace
definition, because it matches the empty string.
You could try:
@@whitespace :: /[^\S\n]+/
If that works, and the documentation misled you to use a normal closure for whitespace, please post an issue to the issue base