pythonparsingpyparsing

what next after pyparsing?


I have a huge grammar developed for pyparsing as part of a large, pure Python application. I have reached the limit of performance tweaking and I'm at the point where the diminishing returns make me start to look elsewhere. Yes, I think I know most of the tips and tricks and I've profiled my grammar and my application to dust.

What next?

I hope to find a parser that gives me the same readability, usability (I'm using many advanced features of pyparsing such as parse-actions to start the post processing of the input which is being parsed) and python integration but at 10× the performance.

I love the fact the the grammar is pure Python.

All my basic blocks are regular expressions, so reusing them would be nice.

I know I can't have everything so I am willing to give up on some of the features I have today to get to the requested 10× performance.

Where do I go from here?


Solution

  • It looks like the pyparsing folks have anticipated your problem. From https://github.com/pyparsing/pyparsing/blob/master/docs/HowToUsePyparsing.rst :

    Performance of pyparsing may be slow for complex grammars and/or large input strings. The psyco package can be used to improve the speed of the pyparsing module with no changes to grammar or program logic - observed improvments have been in the 20-50% range.

    However, as Vangel noted in the comments below, psyco is an obsolete project as of March 2012. Its successor is the PyPy project, which starts from the same basic approach to performance: use a JIT native-code compiler instead of a bytecode interpreter. You should be able to achieve similar or greater gains with PyPy if switching Python implementations will work for you.

    If you're really a speed demon, but want to keep some of the legibility and declarative syntax, I'd suggest having a look at ANTLR. Probably not the Python-generating backend; I'm skeptical whether that's mature or high-performance enough for your needs. I'm talking about the goods: the C backend that started it all.

    Wrap a Python C extension module around the entry point to the parser, and turn it loose.

    Having said that, you'll be giving up a lot in this transition: basically any Python you want to do in your parser will have to be done through the C API (not altogether pretty). Also, you'll have to get used to very different ways of doing things. ANTLR has its charms, but it's not based on combinators, so there's not the easy and fluid relationship between your grammar and your language that there is in pyparsing. Plus, it's its own DSL, much like lex/yacc, which can present a learning curve – but, because it's LL based, you'll probably find it easier to adapt to your needs.