c++parsingboostboost-spiritboost-spirit-x3

Boost spirit x3 - lazy parser with compile time known parsers, referring to a previously matched value


Inspired from sehe's answer at Boost spirit x3 - lazy parser I tried to adapt it to one of my own problem (which is another story).
My grammar to implement has several ways to express numerical literals with bases of 2, 8, 10 and 16. I've reduced the approach mentioned above hopefully to a bearable minimum.

At AST I like to preserve the numerical presentation (integer, fractional, exp parts) as boost::iterator_range<> by use of x3::raw to evaluate it later, only base shall be of integer type. Honesty, I haven't the requirements for the future yet (I could imagine several possibilities - even evaluate it to a real/integer by the parser, but most of the time, the reality looks different.). For simplicity, I've used here std::string here

    struct number {
        unsigned    base;
        std::string literal;
    };

Since the base and numbers can have underscores embedded, I've used range-v3's views::filter() function. Another approach to handle those separated number has sehe shown at X3 parse rule doesn't compile.

The core idea is to have (I've used Qi's Nabialek trick long time ago) something like

    auto const char_set = [](auto&& char_range, char const* name) {
        return x3::rule<struct _, std::string>{ name } = x3::as_parser(
            x3::raw[ x3::char_(char_range) >> *(-lit("_") >> x3::char_(char_range)) ]);
    };
    auto const bin_charset = char_set("01", "binary charset");
    auto const oct_charset = char_set("0-7", "octal charset");
    auto const dec_charset = char_set("0-9", "decimal charset");
    auto const hex_charset = char_set("0-9a-fA-F", "hexadecimal charset");
    
    using Value = ast::number;
    using It    = std::string::const_iterator;
    using Rule  = x3::any_parser<It, Value>;
    
    x3::symbols<Rule> const based_parser({
            { 2,  as<std::string>[ bin_charset ] },
            { 8,  as<std::string>[ oct_charset ] },
            { 10, as<std::string>[ dec_charset ] },
            { 16, as<std::string>[ hex_charset ] }
        }, "based character set"
    );
    
    auto const base = x3::rule<struct _, unsigned>{ "base" } = dec_charset; // simplified
    
    auto const parser = x3::with<Rule>(Rule{}) [
        x3::lexeme[ set_lazy<Rule>[based_parser] >> '#' >> do_lazy<Rule> ]
    ];
    
    auto const grammar = x3::skip[ x3::space ]( parser >> x3:: eoi );   

and use them like

    for (std::string const input : {
            "2#0101",
            "8#42",
            "10#4711",
            "1_6#DEAD_BEEF",
        })
    {
       ...
    }

Well, it doesn't compile and hence I do not know if it would work this way. I think, it's a better way than several lines of alternatives (as my old code). Further, if I study newer standards of the grammar I like to implement, the syntax has been extended with leading integer (for numeric width) and other base specifier, e.g. 'UB', 'UO' and others. This would come off-topic: How can I prepare the code for further grammar extensions (using something like eps[get<std_tag>(ctx) == x42])?

For convenience, I've put the example at coliru.


Solution

  • Well, it doesn't compile and hence I do not know if it would work this way.

    Where to start. Let me recommend: Baby steps. X3 is not the framework to throw together a bunch of code and expect it to just compile let alone do what you want.

    Some notes:

    Let me combine the factories:

    template<typename...> struct Tag { };
    template<typename T, typename P>
    auto
    as(P p, char const* name = "as")
    {
        return x3::rule<Tag<T, P>, T>{name} = x3::as_parser(p);
    }
    

    Now you can simply write

    auto const delimit_numeric_digits = [](auto&& char_range, char const* name)
    {
        auto cs = x3::char_(char_range);
        return as<std::string>(x3::raw[cs >> *('_' >> +cs | cs)], name);
    };
    auto const bin_digits = delimit_numeric_digits("01", "binary digits");
    auto const oct_digits = delimit_numeric_digits("0-7", "octal digits");
    auto const dec_digits = delimit_numeric_digits("0-9", "decimal digits");
    auto const hex_digits = delimit_numeric_digits("0-9a-fA-F", "hexadecimal digits");
    

    (See how I improved on the naming, since charset really didn't cover it).

    Next, fixing the symbol lookup:

    using Rule = x3::any_parser<It, std::string>;
    
    x3::symbols<Rule> const based_parser({
        {"2#", bin_digits},
        {"8#", oct_digits},
        {"10#", dec_digits},
        {"16#", hex_digits},
    });
    

    Notably, the digits only synthesize std::string, not the base. Now, use the trick outlined above to still expose the base as integer:

    auto const parser                              //
        = x3::rule<struct _, Value, true>{"Value"} //
        = x3::with<Rule>(Rule{})[                  //
        x3::lexeme
            [&set_lazy<Rule>[based_parser] >> x3::uint_ >> '#' >> do_lazy<Rule>]];
    

    Live Demo

    Live On Coliru

    //#define BOOST_SPIRIT_X3_DEBUG
    #include <boost/spirit/home/x3.hpp>
    #include <boost/fusion/adapted/struct.hpp>
    
    #include <iostream>
    #include <iomanip>
    
    namespace x3 = boost::spirit::x3;
    
    namespace ast {
        struct number {
            unsigned    base;
            std::string literal;
        };
    }
    
    BOOST_FUSION_ADAPT_STRUCT(ast::number, base, literal)
    
    std::ostream&
    operator<<(std::ostream& os, ast::number const n)
    {
        return os << n.base << '#' << n.literal;
    }
    
    namespace Parsing {
    
    template<typename...> struct Tag { };
    template<typename T, typename P>
    auto
    as(P p, char const* name = "as")
    {
        return x3::rule<Tag<T, P>, T>{name} = x3::as_parser(p);
    }
    
    template<typename Tag>
    struct set_lazy_type
    {
        template<typename P>
        auto
        operator[](P p) const
        {
            auto action = [](auto& ctx) { // set rhs parser
                x3::get<Tag>(ctx) = x3::_attr(ctx);
            };
            return p[action];
        }
    };
    
    template<typename Tag>
    struct do_lazy_type : x3::parser<do_lazy_type<Tag>>
    {
        using attribute_type = typename Tag::attribute_type; // TODO FIXME?
    
        template<typename It, typename Ctx, typename RCtx, typename Attr>
        bool
        parse(It& first, It last, Ctx& ctx, RCtx& rctx, Attr& attr) const
        {
            auto& subject = x3::get<Tag>(ctx);
    
            It saved = first;
            x3::skip_over(first, last, ctx);
            if(x3::as_parser(subject).parse(
                   first,
                   last,
                   std::forward<Ctx>(ctx),
                   std::forward<RCtx>(rctx),
                   attr))
            {
                return true;
            } else
            {
                first = saved;
                return false;
            }
        }
    };
    
    template<typename T> static const set_lazy_type<T> set_lazy{};
    template<typename T> static const do_lazy_type<T> do_lazy{};
    
    auto const delimit_numeric_digits = [](auto&& char_range, char const* name)
    {
        auto cs = x3::char_(char_range);
        return as<std::string>(x3::raw[cs >> *('_' >> +cs | cs)], name);
    };
    auto const bin_digits = delimit_numeric_digits("01", "binary digits");
    auto const oct_digits = delimit_numeric_digits("0-7", "octal digits");
    auto const dec_digits = delimit_numeric_digits("0-9", "decimal digits");
    auto const hex_digits = delimit_numeric_digits("0-9a-fA-F", "hexadecimal digits");
    
    using Value = ast::number;
    using It = std::string::const_iterator;
    using Rule = x3::any_parser<It, std::string>;
    
    x3::symbols<Rule> const based_parser({
        {"2#", bin_digits},
        {"8#", oct_digits},
        {"10#", dec_digits},
        {"16#", hex_digits},
    });
    
    auto const parser                              //
        = x3::rule<struct _, Value, true>{"Value"} //
        = x3::with<Rule>(Rule{})[                  //
        x3::lexeme
            [&set_lazy<Rule>[based_parser] >> x3::uint_ >> '#' >> do_lazy<Rule>]];
    
    auto const grammar = x3::skip(x3::space)[parser >> x3::eoi];
    } // namespace Parsing
    
    int main()
    {
        for(std::string const input : {
                "2#0101",
                "8#42",
                "10#4711",
                "1_6#DEAD_BEEF",
            })
        {
            Parsing::Value attr;
            if(parse(begin(input), end(input), Parsing::grammar, attr))
            {
                std::cout << std::quoted(input) << " -> success (" << attr << ")\n";
            } else
            {
                std::cout << std::quoted(input) << " -> failed\n";
            }
        }
    }
    

    Prints

    "2#0101" -> success (2#0101)
    "8#42" -> success (8#42)
    "10#4711" -> success (10#4711)
    "1_6#DEAD_BEEF" -> failed