c++boost-spirit boost-spirit-qi boost-spirit-lex

Boost::spirit illegal_backtracking exception

I use Boost.Spirit.Lex and .Qi for a simple calculator project and (as usual) it gives me some pain to debug and use. The debug prints:

<expression>
  <try>boost::spirit::multi_pass::illegal_backtracking

This exception is thrown and I can't understand why. I use macros in my code and it would be a pain to give a minimal example so I give the whole project. Just do "make" at the root, and then launch ./sash, a prompt will appear, if you want to test just do "-echo $5-8".

It seems that Google didn't find any similar problems about this exception...

The parser is in arithmetic/, and the call of the parser is at the end of arithmetic/evaluator.cpp

Any helps greatly appreciate.

Solution

Your code is breaking because BOOST_SPIRIT_QI_DEBUG as well as the on_error<> handler seem to use iterators after they might have been invalidated.

To be honest, I'm not completely sure how this could happen.

Background

AFAICT lexertl uses spirit::multipass<> with a split_functor input policy and a split_std_deque storage policy ^[1].

Now, (luckily?) the checking policy is buf_id_check which means that the iterator will check for invalidation at the time of dereference.

Iterators are expected to be invalidated if

the iterator is derefenced, growing the buffer to >16 tokens and the iterator is the only one referring to the shared state.
or somewhere along the line clear_queue is called explicitely (e.g. from the flush_multi_path primitive in the Spirit Repository)

Honestly I don't see any of these two conditions being met. A quick and dirty

token_iterator_type clone = iter; // just to make it non-unique...

in evaluator.cpp doesn't make a difference (ruling out reason #1)

Temporary disabling the docheck implementation in the buf_id_check_policy made valgrind point out that on_error<> and BOOST_SPIRIT_DEBUG* are causing invalid memory references. Commenting both indeed makes all problems go away (and the eval_expression now works).

However, this is likely not your preferred solution.

Proposed solution

Since

you're working on a fixed, in-memory container representing the input you don't really need multi_pass behaviour emulation
you're using a trivial grammar, you don't really benefit from lexertl - while you are getting a lot of added complexity (as you can see)

I've quickly refactored some code: https://github.com/sehe/sash-refactor/commits/master

commit dec31496 sanity - lets do without macros
4 files changed, 59 insertions(+), 146 deletions(-)
commit 6056574c dead code, excess scope, excess instantiation
5 files changed, 38 insertions(+), 62 deletions(-)
commit 99d441db remove lexer
9 files changed, 25 insertions(+), 177 deletions(-)

Now, you will find that your code is generally much simpler, also much shorter, not running into multi_pass limits and you can still have SPIRIT_DEBUG as well as on_error handling :) In the end

binary size in -g3 is reduced from 16Mb to 6.5Mb
a net 263 lines of code have been removed
more importantly, it works

Here's some samples (without debug output):

$ ./sash <<< '-echo $8-9'
    -1
    Warning: Empty environment variable "8-9".
$ ./sash <<< '-echo $8\*9'
    72
    Warning: Empty environment variable "8*9".
$ ./sash <<< '-echo $8\*(9-1)'
    64
    Warning: Empty environment variable "8*(9-1)".
$ ./sash <<< '-echo $--+-+8\*(9-1)'
    -64
    Warning: Empty environment variable "--+-+8*(9-1)".

^[1] Which, despite it's name, buffers previously seen tokens in a std::vector<>