[SOLVED] Up-to-date Prolog implementation benchmarks?

Up-to-date Prolog implementation benchmarks?

Are there any up-to-date Prolog implementation benchmarks (with results)?

I found this on the mercury web site. Surprisingly, it shows a 20-fold gap between swi-prolog and Aquarius. I suspect that these results are pretty old. Does this gap still hold? Personally, I'd also like to see some comparisons with the occurs check turned on, since it has a major impact on performance, and some compilers might be better than others at optimizing it away.

Of the more recent comparisons, I found this claim that gnu-prolog is 2x faster than SWI, and YAP is 4x faster than SWI on one specific code base.

Edit:

a specific case where the occurs check is needed for a real world problem

Sure: type inference in Haskell, OCaml, Swift or theorem provers such as this one. I also think the burden is on the programmer to prove that his code doesn't need the occurs check. Tests can only prove that you do need it, not that you don't need it.

Solution

There's also a classical set of Prolog benchmarks that can be used for comparing Prolog implementations. Some Prolog systems include them (e.g. SWI-Prolog). They are also included in the Logtalk distribution, which allows running them with the supported backends:

https://github.com/LogtalkDotOrg/logtalk3/tree/master/examples/bench

In the current Logtalk git version, you can start it with the backend you want to benchmark and use the queries:

?- {bench(loader)}.
...
?- run.

These will run each benchmark 1000 times are reported the total time. Use run/1 for a different number of repetitions. For example, in my macOS system using SWI-Prolog 8.3.15 I get:

?- run.
boyer: 20.897818 seconds
chat_parser: 7.962188999999999 seconds
crypt: 0.14653999999999812 seconds
derive: 0.004462999999997663 seconds
divide10: 0.002300000000001745 seconds
log10: 0.0011489999999980682 seconds
meta_qsort: 0.2729539999999986 seconds
mu: 0.04534600000000211 seconds
nreverse: 0.016964000000001533 seconds
ops8: 0.0016230000000021505 seconds
poly_10: 1.9540520000000008 seconds
prover: 0.05286200000000463 seconds
qsort: 0.030829000000004214 seconds
queens_8: 2.2245050000000077 seconds
query: 0.11675499999999772 seconds
reducer: 0.00044199999999960937 seconds
sendmore: 3.048624999999994 seconds
serialise: 0.0003770000000073992 seconds
simple_analyzer: 0.8428750000000065 seconds
tak: 5.495768999999996 seconds
times10: 0.0019139999999993051 seconds
unify: 0.11229400000000567 seconds
zebra: 1.595203000000005 seconds
browse: 31.000829000000003 seconds
fast_mu: 0.04102400000000728 seconds
flatten: 0.028527999999994336 seconds
nand: 0.9632950000000022 seconds
perfect: 0.36678499999999303 seconds
true.

For SICStus Prolog 4.6.0 I get:

| ?- run.
boyer: 3.638 seconds
chat_parser: 0.7650000000000006 seconds
crypt: 0.029000000000000803 seconds
derive: 0.0009999999999994458 seconds
divide10: 0.001000000000000334 seconds
log10: 0.0009999999999994458 seconds
meta_qsort: 0.025000000000000355 seconds
mu: 0.004999999999999893 seconds
nreverse: 0.0019999999999997797 seconds
ops8: 0.001000000000000334 seconds
poly_10: 0.20500000000000007 seconds
prover: 0.005999999999999339 seconds
qsort: 0.0030000000000001137 seconds
queens_8: 0.2549999999999999 seconds
query: 0.024999999999999467 seconds
reducer: 0.001000000000000334 seconds
sendmore: 0.6079999999999997 seconds
serialise: 0.0019999999999997797 seconds
simple_analyzer: 0.09299999999999997 seconds
tak: 0.5869999999999997 seconds
times10: 0.001000000000000334 seconds
unify: 0.013000000000000789 seconds
zebra: 0.33999999999999986 seconds
browse: 4.137 seconds
fast_mu: 0.0070000000000014495 seconds
nand: 0.1280000000000001 seconds
perfect: 0.07199999999999918 seconds
yes

For GNU Prolog 1.4.5, I use the sample embedding script in logtalk3/scripts/embedding/gprolog to create an executable that includes the bench example fully compiled:

| ?- run.
boyer: 9.3459999999999983 seconds
chat_parser: 1.9610000000000003 seconds
crypt: 0.048000000000000043 seconds
derive: 0.0020000000000006679 seconds
divide10: 0.00099999999999944578 seconds
log10: 0.00099999999999944578 seconds
meta_qsort: 0.099000000000000199 seconds
mu: 0.012999999999999901 seconds
nreverse: 0.0060000000000002274 seconds
ops8: 0.00099999999999944578 seconds
poly_10: 0.72000000000000064 seconds
prover: 0.016000000000000014 seconds
qsort: 0.0080000000000008953 seconds
queens_8: 0.68599999999999994 seconds
query: 0.041999999999999815 seconds
reducer: 0.0 seconds
sendmore: 1.1070000000000011 seconds
serialise: 0.0060000000000002274 seconds
simple_analyzer: 0.25 seconds
tak: 1.3899999999999988 seconds
times10: 0.0010000000000012221 seconds
unify: 0.089999999999999858 seconds
zebra: 0.63499999999999979 seconds
browse: 10.923999999999999 seconds
fast_mu: 0.015000000000000568 seconds

(27352 ms) yes

I suggest you try these benchmarks, running them on your computer, with the Prolog systems that you want to compare. In doing that, remember that this is a limited set of benchmarks, not necessarily reflecting the actual relative performance in non-trivial applications.

Ratios:

              SICStus/SWI   GNU/SWI

boyer               17.4%     44.7%
browse              13.3%     35.2%
chat_parser          9.6%     24.6%
crypt               19.8%     32.8%
derive              22.4%     44.8%
divide10            43.5%     43.5%
fast_mu             17.1%     36.6%
flatten                 -         -
log10               87.0%     87.0%
meta_qsort           9.2%     36.3%
mu                  11.0%     28.7%
nand                13.3%         -
nreverse            11.8%     35.4%
ops8                61.6%     61.6%
perfect             19.6%         -
poly_10             10.5%     36.8%
prover              11.4%     30.3%
qsort                9.7%     25.9%
queens_8            11.5%     30.8%
query               21.4%     36.0%
reducer            226.2%      0.0%
sendmore            19.9%     36.3%
serialise          530.5%   1591.5%
simple_analyzer     11.0%     29.7%
tak                 10.7%     25.3%
times10             52.2%     52.2%
unify               11.6%     80.1%
zebra               21.3%     39.8%

P.S. Be sure to use Logtalk 3.43.0 or later as it includes portability fixes for the bench example, including for GNU Prolog, and a set of basic unit tests.