c++parsingboost-spiritboost-spirit-x3

boost x3 grammar for structs with multiple constructors


Trying to figure out how to parse structs that have multiple constructors or overloaded constructors. For example in this case, a range struct that contains either a range or a singleton case where the start/end of the range is equal.

case 1: look like

"start-stop"

case 2:

"start"

For the range case

auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);

works but

auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);

unsurprisingly, won't match the signature and fails to compile.

Not sure what the fix is?

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyRange
{
    size_t start;
    size_t end;
    // little bit weird because should be end+1, but w/e
    explicit MyRange(size_t start, size_t end = 0) : start(start), end(end == 0 ? start : end)
    {
    }
};
BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)
// BOOST_FUSION_ADAPT_STRUCT(MyRange, start)
//

int main()
{
 
    auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);
    // auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);

    for (std::string input :
         {"1-2", "1","1-" ,"garbage"})
    {
                auto success = x3::phrase_parse(input.begin(), input.end(),
                                        // Begin grammar
                                        range_constraint,
                                        // End grammar
                                        x3::ascii::space);
        std::cout << "`" << input << "`"
                  << "-> " << success<<std::endl;
    }
    return 0;
}

Solution

  • It's important to realize that sequence adaptation by definition uses default construction with subsequent sequence element assignment.

    Another issue is branch ordering in PEG grammars. int_ will always success where int_ >> '‑' >> int_ would so you would never match the range version.

    Finally, to parse size_t usually prefer uint_/uint_parser<size_t> :)

    Things That Don't Work

    There are several ways to skin this cat. For one, there's BOOST_FUSION_ADAPT_STRUCT_NAMED, which would allow you to do

    BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, Range, start, end)
    BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, SingletonRange, start)
    

    So one pretty elaborate would seem to spell it out:

    auto range     = x3::rule<struct _, Range>{}          = uint_ >> '-' >> uint_;
    auto singleton = x3::rule<struct _, SingletonRange>{} = uint_;
    auto rule      = x3::rule<struct _, MyRange>{}        = range | singleton;
    

    TIL that this doesn't even compile, apparently Qi was differently: Live On Coliru

    X3 requires the attribute to be default-constructible whereas Qi would attempt to bind to the passed-in attribute reference first.

    Even in the Qi version you can see that the fact Fusion sequences will be default-contructed-then-memberwise-assigned leads to results you didn't expect or want:

    `1-2` -> true
     -- [1,NIL)
    `1` -> true
     -- [1,NIL)
    `1-` -> true
     -- [1,NIL)
    `garbage` -> false
    

    What Works

    Instead of doing the complicated things, do the simple thing. Anytime you see an optional value you can usually provide a default value. Alternatively you can not use Sequence adaptation at all, and go straight to semantic actions.

    Semantic Actions

    The simplest way would be to have specific branches:

    auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
    auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx))); };
    
    auto rule = x3::rule<void, MyRange>{} =
        (uint_ >> '-' >> uint_)[assign2] | uint_[assign1];
    

    Slighty more advanced, but more efficient:

    auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
    auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(_val(ctx).start, _attr(ctx)); };
    
    auto rule = x3::rule<void, MyRange>{} = uint_[assign1] >> -('-' >> uint_[assign2]);
    

    Lastly, we can move towards defaulting the optional end:

    auto rule = x3::rule<void, MyRange>{} =
        (uint_ >> ('-' >> uint_ | x3::attr(MyRange::unspecified))) //
            [assign];
    

    Now the semantic action will have to deal with the variant end type:

    auto assign = [](auto& ctx) {
        auto start = at_c<0>(_attr(ctx));
        _val(ctx)  = apply_visitor(                         //
            [=](auto end) { return MyRange(start, end); }, //
            at_c<1>(_attr(ctx)));
    };
    

    Also Live On Coliru

    Simplify?

    I'd consider modeling the range explicitly as having an optional end:

    struct MyRange {
        MyRange() = default;
        MyRange(size_t s, boost::optional<size_t> e = {}) : start(s), end(e) {
            assert(!e || *e >= s);
        }
    
        size_t size() const  { return end? *end - start : 1; }
        bool   empty() const { return size() == 0; }
    
        size_t                  start = 0;
        boost::optional<size_t> end   = 0;
    };
    

    Now you can directly use the optional to construct:

    auto assign = [](auto& ctx) {
        _val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx)));
    };
    
    auto rule = x3::rule<void, MyRange>{} = (uint_ >> -('-' >> uint_))[assign];
    

    Actually, here we can go back to using adapted sequences, although with different semantics:

    Live On Coliru

    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/home/x3.hpp>
    #include <iomanip>
    #include <iostream>
    namespace x3 = boost::spirit::x3;
    
    struct MyRange {
        size_t                  start = 0;
        boost::optional<size_t> end   = 0;
    };
    
    static inline std::ostream& operator<<(std::ostream& os, MyRange const& mr) {
        if (mr.end)
            return os << "[" << mr.start << "," << *mr.end << ")";
        else
            return os << "[" << mr.start << ",)";
    }
    
    BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)
    
    int main() {
        x3::uint_parser<size_t> uint_;
        auto rule = x3::rule<void, MyRange>{} = uint_ >> -('-' >> uint_);
    
        for (std::string const input : {"1-2", "1", "1-", "garbage"}) {
            MyRange into;
            auto    success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
            std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
                      << std::endl;
    
            if (success) {
                std::cout << " -- " << into << "\n";
            }
        }
    }
    

    Summarizing

    I hope these strategies give you all the things you needed. Pay close attention to the semantics of your range. Specifically, I never payed any attention to difference between "1" and "1-". You might want one to be [1,2) and the other to be [1,inf), both to be equivalent, or the second one might even be considered invalid?

    Stepping back even further, I'd suggest that maybe you just needed

    using Bound   = std::optional<size_t>;
    using MyRange = std::pair<Bound, Bound>;
    

    Which you could parse directly with:

    auto boundary = -x3::uint_parser<size_t>{};
    auto rule = x3::rule<void, MyRange>{} = boundary >> '-' >> boundary;
    

    It would allow for more inputs:

    for (std::string const input : {"-2", "1-2", "1", "1-", "garbage"}) {
        MyRange into;
        auto    success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
        std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
                  << std::endl;
    
        if (success) {
            std::cout << " -- " << into << "\n";
        }
    }
    

    Prints: Live On Coliru

    `-2` -> true
     -- [,2)
    `1-2` -> true
     -- [1,2)
    `1` -> false
    `1-` -> true
     -- [1,)
    `garbage` -> false