c++parsingboosttokenizeboost-spirit

try to parse a simple "\s*identifier\s+identifier\s+identifier\s*" string


I want to parse strings like these

"blub blib blab"
"\n  \n blub \t \t \n blib  \t \n  blab \n \t \n"
"  blub   blib  blab  "

and extract "blub", "blib" and "blab" into members a, b, and c of a struct (defined in the below code).

idea: i think the *qi::space and +qi::space are becomming part of my result-set and the "not correctly" filled members are the spaces that gets found

I am trying to parse some scripting language with Spirit Qi and this is my first step - after years of not using Spirit. I know that I can easily parse this with regex, but that is not my intention.

#include <string>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

struct Out
{
    Out() = default;
    Out( const std::string& a_, const std::string& b_, const std::string& c_ ) : 
a( a_ ), b( b_ ), c( c_ )
    {
    }
    std::string a;
    std::string b;
    std::string c;
};
BOOST_FUSION_ADAPT_STRUCT( Out, a, b, c )

int main()
{
    qi::rule<char const*, std::string()> identifier_rule = 
        qi::char_( "a-zA-Z_" ) >> *qi::char_( "a-zA-Z0-9_" );

    boost::spirit::qi::rule<char const*, Out()> abc_rule =
        *qi::space >> identifier_rule >> +qi::space >> identifier_rule >> +qi::space >> identifier_rule >> *qi::space;

    std::string test = "blub blib blab";
    //std::string test = "\n  \n blub \t \t \n blib  \t \n  blab \n \t \n";
    //std::string test = "  blub   blib  blab  ";

    Out o;
    char const* f( test.c_str() );
    char const* l( f + test.size() );
    assert( qi::parse( f, l, abc_rule, o ) );
    assert( o.a == "blub" );
    assert( o.b == "blib" );
    assert( o.c == "blab" );
    
    return 0;
}

Solution

  • I made a self-contained test-bed for all the cases:

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    
    namespace qi = boost::spirit::qi;
    
    struct Out {
        std::string a, b, c;
    };
    BOOST_FUSION_ADAPT_STRUCT(Out, a, b, c)
    
    int main() {
        using It = std::string_view::const_iterator;
        qi::rule<It, std::string()> identifier_rule = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
    
        qi::rule<It, Out()> abc_rule         //
            = *qi::space >> identifier_rule  //
            >> +qi::space >> identifier_rule //
            >> +qi::space >> identifier_rule //
            >> *qi::space;
    
        for (std::string_view test : {
                 "blub blib blab",
                 "\n  \n blub \t \t \n blib  \t \n  blab \n \t \n",
                 "  blub   blib  blab  ",
             }) {
            It f = test.begin(), l = test.end();
            if (Out o; qi::parse(f, l, abc_rule, o))
            {
                std::cout << "Parsed:\n"
                          << "A: " << quoted(o.a) << "\n"
                          << "B: " << quoted(o.b) << "\n"
                          << "C: " << quoted(o.c) << "\n";
            } else
                std::cout << "Failed to parse " << quoted(test) << std::endl;
        }
    }
    

    The result is what I'd expect:

    Parsed:
    A: ""
    B: "blub"
    C: " "
    Parsed:
    A: "
    
     "
    B: "blub"
    C: "
     "
    Parsed:
    A: "  "
    B: "blub"
    C: "   "
    

    What you probably EXPECT to happen is that qi::space are omitted. You have to tell it:

    Live On Coliru

    qi::rule<It, Out()> abc_rule                   //
        = qi::omit[*qi::space] >> identifier_rule  //
        >> qi::omit[+qi::space] >> identifier_rule //
        >> qi::omit[+qi::space] >> identifier_rule //
        >> qi::omit[*qi::space];
    

    Prints

    A: "blub"
    B: "blib"
    C: "blab"
    Parsed:
    A: "blub"
    B: "blib"
    C: "blab"
    Parsed:
    A: "blub"
    B: "blib"
    C: "blab"
    

    WAY SIMPLER

    The idiomatic approach is to use a skipper instead¹. Then becomes as simple as:

    Live On Coliru

    qi::rule<It, Out()> abc_rule =
        qi::skip(qi::space)[identifier_ >> identifier_ >> identifier_];
    

    Still printing the same.


    ¹ See for background: Boost spirit skipper issues