boostsegmentation-faultboost-spiritboost-spirit-qiboost-spirit-lex

Flipping the order of subrules inside a rule in a boost::spirit grammar results in segfault


Warning; while I tried to shorten the code down, to a minimum. I still had to include quite a bit, to ensure that the required information was present.

This code, compiles files, and runs resulting in a syntax error;

name = simple_name      [ qi::_val = qi::_1 ]
     | qualified_name   [ qi::_val = qi::_1 ]
     ;

While this;

name = qualified_name   [ qi::_val = qi::_1 ]
     | simple_name      [ qi::_val = qi::_1 ]
     ;

Results in a SIGSEGV, Segmentation fault;

boost::detail::function::function_obj_invoker4<boost::spirit::qi::detail::parser_binder<boost::spirit::qi::alternative<boost::fusion::cons<boost::spirit::qi::action<boost::spirit::qi::reference<boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<char*, std::string>, boost::mpl::vector<std::string, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<false>, unsigned long>, boost::spirit::lex::lexertl::detail::data, __gnu_cxx::__normal_iterator<char*, std::string>, mpl_::bool_<true>, mpl_::bool_<false> > >, Ast::name* (), boost::spirit::unused_type, boost::spirit::unused_type, boost::spirit::unused_type> const>, boost::phoenix::actor<boost::proto::exprns_::expr<boost::proto::tagns_::tag::assign, boost::proto::argsns_::list2<boost::proto::exprns_::expr<boost::proto::tagns_::tag::terminal, boost::proto::argsns_::term<boost::spirit::attribute<0> >,0l>,boost::phoenix::actor<boost::spirit::argument<0> > >, 2l> > >,boost::fusion::cons<boost::spirit::qi::action<boost::spirit::qi::reference<boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<char*, std::string>,boost::mpl::vector<std::string, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<false>,unsigned long>, boost::spirit::lex::lexertl::detail::data, __gnu_cxx::__normal_iterator<char*,std::string>, mpl_::bool_<true>, mpl_::bool_<false> > >, Ast::name* (), ... more to come ...

Where;

simple_name = (tok.identifier) [ qi::_val = build_simple_name_(qi::_1) ];

And;

qualified_name = (name >> qi::raw_token(DOT) >> tok.identifier) [ qi::_val = build_qualified_name_(qi::_1, qi::_2) ] ;

All of these rules, return a Ast::name*();

qi::rule<Iterator, Ast::name*()> name;
qi::rule<Iterator, Ast::name*()> simple_name;
qi::rule<Iterator, Ast::name*()> qualified_name;

The helper functions are defined as;

Ast::name* build_simple_name(std::string str)
{
    return (new Ast::name_simple(Ast::identifier(str)));
}
BOOST_PHOENIX_ADAPT_FUNCTION(Ast::name*, build_simple_name_, build_simple_name, 1)

And;

Ast::name* build_qualified_name(Ast::name* name, std::string str)
{
    std::list<Ast::identifier> qualified_name = Ast::name_to_identifier_list(name);
    qualified_name.push_back(Ast::identifier(str));

    return (new Ast::name_qualified(qualified_name));
}
BOOST_PHOENIX_ADAPT_FUNCTION(Ast::name*, build_qualified_name_, build_qualified_name, 2)

The lexer definitions used are defined as;

lex::token_def<std::string> identifier = "{JAVA_LETTER}{JAVA_LETTER_OR_DIGIT}*";

And;

('.', DOT)

Where the patterns {JAVA_LETTER} and {JAVA_LETTER_OR_DIGIT} are defined as;

("DIGIT",           "[0-9]")
("LATIN1_LETTER",   "[A-Z]|[a-z]")
("JAVA_LETTER",     "{LATIN1_LETTER}|$|_")
("JAVA_LETTER_OR_DIGIT", "{JAVA_LETTER}|{DIGIT}")

My input, is a simple string;

package a.D;

Which lexes to the tokens;

Keywords : package
Identifier : a
Delimiters : .
Identifier : D
Delimiters : ;

Where the first example (with simple_name first), throws a syntax error as;

Syntax Error at line 1:
package a.D;
          ^^

And the last example simply throws an segfault, with the error posted previously.

Clearly the second example is what I want, as it should try to match the complex expression, before the simple one.

Does anyone see why the code crashes, or how I would go about figuring out? - Also should this be at code review?


Solution

  • The problem is that you have a left recursive grammar and that cannot be used with Boost.Spirit. What you have is basically:

    name = identifier | name >> dot >> identifier;
    

    As you can see here, in order to remove the left recursion when you have something like:

    A = A >> alpha | beta;
    

    You need to create 2 new "rules":

    A = beta >> A_tail;
    A_tail = eps | alpha >> A_tail;
    

    In your case:

    A := name
    alpha := dot >> identifier
    beta := identifier
    

    So your "rules" would be:

    name = identifier >> name_tail;
    name_tail = eps | dot >> identifier >> A_tail;
    

    If you look closely at name_tail you can see that it literally means: either nothing or dot >> identifier followed by either nothing or dot >> identifier and so on. That means that name_tail is:

    name_tail = *(dot >> identifier);
    

    So finally your name rule would be:

    name = identifier >> *(dot >> identifier);
    

    All of this is correct, but there is a very good chance that it will not work with your attributes.