c++parsingboostboost-spirit-lexboost-spirit-x3

Spirit X3 parser start state?


I've been going through the Boost.Spirit X3 documentation I've been able to find---which isn't much---and think I would like to use this for my next parsing project. Notably I have never used Boost.Spirit Classic or V2, but have used flex/bison and ANTLR.

The format I'm looking to parse, in its most basic sense, looks like this:

unimportant
foo
bar
# BEGIN
parse this
...
# END
ignore this

Where only the text between "# BEGIN" and "# END" is parsed and everything else is completely ignored. I am trying to figure out an effective way to accomplish this in an X3 parser. Some ideas I've had:

  1. Use basic string search functions to limit the range of the parse. This seems like the worst option, as the text will be processed multiple times rather than in one pass.
  2. Look into Spirit.Lex. Again I've found some difficulty finding any decent reading material on Spirit.Lex, but it seems like Lex provides lexer start states which would be the traditional way of handling this job. As an aside, since X3 is C++14-based and Spirit.Lex is built on top of lexertl, is there a configuration option or way to use Spirit.Lex with the modernized lexertl14?
  3. Perhaps there is some meaningful way to handle this in X3? As the grammar is actually extremely simple, I think having a separate lexer is overkill.

Solution

  • The sample in Using Boost Spirit to parse a text file while skipping large parts of it applies to X3 as well:

    Live On Coliru

    #if 0
    <lots of text not including "label A" or "label B">    
    label A: 34
    <lots of text not including "label A" or "label B">
    label B: 45
    <lots of text not including "label A" or "label B">
    ...
    #endif
    #include <boost/fusion/adapted/std_pair.hpp>
    #include <boost/spirit/home/x3.hpp>
    #include <boost/spirit/include/support_istream_iterator.hpp>
    #include <fstream>
    #include <iostream>
    
    namespace x3 = boost::spirit::x3;
    
    int main()
    {
        std::ifstream ifs("main.cpp");
        ifs >> std::noskipws;
    
        boost::spirit::istream_iterator f(ifs), l;
    
        std::vector<std::pair<char, int> > parsed;
        using namespace x3;
        bool ok = phrase_parse(
                f, l, 
                *seek [ eol >> "label" >> char_("A-Z") >> ':' >> int_ ],
                blank,
                parsed
            );
    
        if (ok)
        {
            std::cout << "Found:\n";
            for (auto& p : parsed)
                std::cout << "'" << p.first << "' has value " << p.second << "\n";
        }
        else
            std::cout << "Fail at: '" << std::string(f,l) << "'\n";
    }
    

    Printing

    Found:
    'A' has value 34
    'B' has value 45
    

    Of course you have to modify it to your taste. If you search qi seek on [SO] you'll find more samples, some of which might be closer to your purpose still.