I want to parse strings like these
"blub blib blab"
"\n \n blub \t \t \n blib \t \n blab \n \t \n"
" blub blib blab "
and extract "blub"
, "blib"
and "blab"
into members a
, b
, and c
of a struct (defined in the below code).
idea: i think the *qi::space and +qi::space are becomming part of my result-set and the "not correctly" filled members are the spaces that gets found
I am trying to parse some scripting language with Spirit Qi and this is my first step - after years of not using Spirit. I know that I can easily parse this with regex, but that is not my intention.
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
struct Out
{
Out() = default;
Out( const std::string& a_, const std::string& b_, const std::string& c_ ) :
a( a_ ), b( b_ ), c( c_ )
{
}
std::string a;
std::string b;
std::string c;
};
BOOST_FUSION_ADAPT_STRUCT( Out, a, b, c )
int main()
{
qi::rule<char const*, std::string()> identifier_rule =
qi::char_( "a-zA-Z_" ) >> *qi::char_( "a-zA-Z0-9_" );
boost::spirit::qi::rule<char const*, Out()> abc_rule =
*qi::space >> identifier_rule >> +qi::space >> identifier_rule >> +qi::space >> identifier_rule >> *qi::space;
std::string test = "blub blib blab";
//std::string test = "\n \n blub \t \t \n blib \t \n blab \n \t \n";
//std::string test = " blub blib blab ";
Out o;
char const* f( test.c_str() );
char const* l( f + test.size() );
assert( qi::parse( f, l, abc_rule, o ) );
assert( o.a == "blub" );
assert( o.b == "blib" );
assert( o.c == "blab" );
return 0;
}
I made a self-contained test-bed for all the cases:
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
struct Out {
std::string a, b, c;
};
BOOST_FUSION_ADAPT_STRUCT(Out, a, b, c)
int main() {
using It = std::string_view::const_iterator;
qi::rule<It, std::string()> identifier_rule = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
qi::rule<It, Out()> abc_rule //
= *qi::space >> identifier_rule //
>> +qi::space >> identifier_rule //
>> +qi::space >> identifier_rule //
>> *qi::space;
for (std::string_view test : {
"blub blib blab",
"\n \n blub \t \t \n blib \t \n blab \n \t \n",
" blub blib blab ",
}) {
It f = test.begin(), l = test.end();
if (Out o; qi::parse(f, l, abc_rule, o))
{
std::cout << "Parsed:\n"
<< "A: " << quoted(o.a) << "\n"
<< "B: " << quoted(o.b) << "\n"
<< "C: " << quoted(o.c) << "\n";
} else
std::cout << "Failed to parse " << quoted(test) << std::endl;
}
}
The result is what I'd expect:
Parsed:
A: ""
B: "blub"
C: " "
Parsed:
A: "
"
B: "blub"
C: "
"
Parsed:
A: " "
B: "blub"
C: " "
What you probably EXPECT to happen is that qi::space
are omitted. You have to tell it:
qi::rule<It, Out()> abc_rule //
= qi::omit[*qi::space] >> identifier_rule //
>> qi::omit[+qi::space] >> identifier_rule //
>> qi::omit[+qi::space] >> identifier_rule //
>> qi::omit[*qi::space];
Prints
A: "blub"
B: "blib"
C: "blab"
Parsed:
A: "blub"
B: "blib"
C: "blab"
Parsed:
A: "blub"
B: "blib"
C: "blab"
The idiomatic approach is to use a skipper instead¹. Then becomes as simple as:
qi::rule<It, Out()> abc_rule =
qi::skip(qi::space)[identifier_ >> identifier_ >> identifier_];
Still printing the same.
¹ See for background: Boost spirit skipper issues