c++boostboost-spirit

What's wrong with my C++ Boost Spirit Grammar & Rules?


I'm still pretty new to Boost and C++ (only a week in learning this). With the great help from Sehe from my other post Boost Spirit question here I'm trying to extend the Simple class to parse and accept more syntax but what I have doesn't seem to work right.

Currently, Simple class can handle this input:

Class Simple caption;
Class Simple columns "Column Name";

This is the original Simple class from the above link

using Id = std::string;
using Literal = std::string;
using Enum = std::vector<Id>;

struct Base {
    Id      id;
    Literal literal;
};

struct Simple : Base {
    Enum enumeration;
};
BOOST_FUSION_ADAPT_STRUCT(Ast::Simple, id, literal, enumeration);

And the grammar rule

id_ = raw[alpha >> *('_' | alnum)];
id_ = raw[alpha >> *('_' | alnum)];
literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';
auto optlit = copy(literal_ | attr(std::string(" ")));

simple_ = lit("Simple") >> id_ >> optlit >> enum_;
enum_ = -(lit("enumeration") >> '(' >> (id_ % ',') > ')');

And the expected outcome is:

<class>
    <simple>
      <identifier>caption:caption</identifier>
      <literal>" "</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>columns:columns</identifier>
      <literal>"Column Name"</literal>
    </simple>
  </class>

I'm trying to extend this rule so the Simple class can accept the following:

Class Simple caption;
Class Simple columns "Column Name";
Class Simple some_value datatype restriction;
Class Simple other_value Default 0;

And the new expected outcome should be:

<class>
    <simple>
      <identifier>caption:caption</identifier>
      <literal>" "</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>columns:columns</identifier>
      <literal>"Column Name"</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>some_value:some_value</identifier>
      <literal>" "</literal>
      <word>restriction</word>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>other_value:other_value</identifier>
      <literal>" "</literal>
      <default>
            <num>0</num>
      </default>
    </simple>
  </class>

This is what I did but Simple class did not correctly parse the above.

using Id = std::string;
using Literal = std::string;
using Datatype = std::string;
using Default = std::string;
using Enum = std::vector<Id>;

struct Base {
    Id      id;
    Literal literal;
};

struct Simple : Base {
    Enum enumeration;
    Datatype datatype;
        Default def;
};
BOOST_FUSION_ADAPT_STRUCT(Ast::Simple, id, literal, enumeration, datatype, def);

And for the grammar rule, I did this:

id_ = raw[alpha >> *('_' | alnum)];
id_ = raw[alpha >> *('_' | alnum)];
literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';
auto optlit = copy(literal_ | attr(std::string(" ")));

simple_ = lit("Simple") >> id_ >> optlit >> (enum_ | datatype_ | default_);
enum_ = -(lit("enumeration") >> '(' >> (id_ % ',') > ')');
datatype_ = -(lit("datatype") >> id_);
default_ = -(lit("Default") >> alnum)

qi::rule<It, Ast::Simple(), Skipper>    simple_;
qi::rule<It, Ast::Enum(), Skipper>      enum_;
qi::rule<It, Ast::Datatype(), Skipper>  datatype_;
qi::rule<It, Ast::Default(), Skipper>  default_;

I keep getting parse error

 -> EXPECTED ";" in line:3
       Class Simple my_value datatype restriction;
                             ^--- here

It doesn't seem to accept and parse the datatype. What did I miss or do wrong here?
Thanks for your help

Full Demo here: Live on Compiler Explorer


Solution

  • One problem I already addressed in the comment:

    it seems like you want (-a >> -b >> -c) instead of -(a|b|c)

    Indeed, running your program prints

     -> EXPECTED ";" in line:3
           Class Simple my_value datatype restriction;
                                 ^--- here
    

    On the one hand, I'm thrilled that now my VeryPretty™ error reporting got a chance to shine, but clearly you wanted to change

    simple_    = lit("Simple") >> id_ >> optlit >> (enum_ | datatype_ | default_);
    

    To

    simple_    = lit("Simple") >> id_ >> optlit >> enum_ >> datatype_ >> default_;
    

    Indeed, changing that around prints:

    <task>
      <class>
        <simple>
          <identifier>caption</identifier>
          <literal> </literal>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>columns</identifier>
          <literal>Column Name</literal>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>my_value</identifier>
          <literal> </literal>
          <word>restriction</word>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>def_value</identifier>
          <literal> </literal>
          <default>
            <num>0</num>
          </default>
        </simple>
      </class>
    </task>
    

    However several things seem out of order. For one the appearance of "word" in the XML seems odd, since Simple does not contain such a member. It does seem like the implementation details are leaking out in the XML.

    I can guess that this is caused by the lack of control that is, itself, caused by ambiguous types:

    using Id       = std::string;
    using Literal  = std::string;
    using Datatype = std::string;
    using Default  = std::string;
    

    Choosing identical types for domain datatypes is fine, but it breaks down when you try to use the C++ type system to enforce domain logic. Specifically, we use overload resolution to pick what XML to generate for an AST node.

    You may note that I had already realized this conundrum in my original sketch, which is why you see the overload generating for std::string take a disambiguator kind:

    void apply(Node parent, std::string const& s, char const* kind) const {
    

    And there it is that we run into a different code smell:

    if (s != "") {
        named_child(parent, kind).text().set(s.c_str());
    }
    

    I know I didn't have the conditional there. And the conditional suffers the same problem as I just described: it is blind to the domain semantics of the "thing" being serialized. Some strings might be "present" and need to be serialized even if empty (e.g. consider when a default value is the empty string, dropping the node entirely may change the meaning).

    Instead, you want to express intent/semantics in the AST itself.

    Expressive AST

    Let's create distinct string types.

    I showed one approach in the first Live Demo before. I "cleverly" omitted that hack from the answer because it wasn't essential to the final solution, and also because that solution using inheritance violates Liskov Substitution Principle as well as risks surprises (e.g.).

    So let's do another approach. Still using a hack to avoid writing lots of code, but without the issues described:

    template <typename> struct flavour_hack : std::char_traits<char> {};
    template <typename Tag>
        using String = std::basic_string<char, flavour_hack<Tag> >;
    
    using Id       = String<struct TagId>;
    using Literal  = String<struct TagLiteral>;
    using Datatype = String<struct TagDatatype>;
    using Default  = String<struct TagDefault>;
    

    Yes these are fully standard C++ strings, but distinct from eachother. Nice. Now the above overload can be split into different overloads, dropping the kind parameter since that information is now part of the AST type:

    void apply(Node parent, Ast::Id const& id) const {
        named_child(parent, "identifier").text().set(id.c_str());
    }
    
    void apply(Node parent, Ast::Literal const& literal) const {
        named_child(parent, "literal").text().set(literal.c_str());
    }
    
    void apply(Node parent, Ast::Datatype const& datatype) const {
        named_child(parent, "datatype").text().set(datatype.c_str());
    }
    
    void apply(Node parent, Ast::Default const& default_) const {
        named_child(parent, "default").text().set(default_.c_str());
    }
    

    What about the optional strings? Well I'm glad you asked. And I'm glad hypothetical you phrased the question accurately: optional strings are just strings, that are optional. It's easy to express the semantics in C++:

    struct Simple : Base {
        Enum                    enumeration;
        std::optional<Datatype> datatype;
        std::optional<Default>  def;
    };
    

    What's more, now I can probably stop being lazy about the enumeration as well. Before, I just decided/assumed on your behalf that empty enumerations couldn't occur. Let's not, just to drive home the point about domain semantics:

    struct Simple : Base {
        std::optional<Enum>     enumeration;
        std::optional<Datatype> datatype;
        std::optional<Default>  def;
    };
    

    And fixing the rule for enumeration:

    enum_ = lit("enumeration") >> '(' >> -(id_ % ',') > ')';
    

    Moving the optionality to the parent rule smooths out the mapping of rules to their bound attributes:

    simple_    = lit("Simple") >> id_ >> optlit >> -enum_ >> -datatype_ >> -default_;
    enum_      = lit("enumeration") >> '(' >> -(id_ % ',') > ')';
    datatype_  = lit("datatype") >> id_;
    default_   = lit("Default") >> alnum;
    

    With these tweaks, we get Live Demo for

    Class Simple A;
    Class Simple B "Column Name";
    Class Simple C datatype restriction;
    Class Simple D Default 0;
    Class Simple E enumeration() Default 0;
    Class Simple F enumeration() Default "not_a_number";
    

    The output:

    <task>
      <class>
        <simple>
          <identifier>A</identifier>
          <literal> </literal>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>B</identifier>
          <literal>Column Name</literal>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>C</identifier>
          <literal> </literal>
          <datatype>restriction</datatype>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>D</identifier>
          <literal> </literal>
          <default>0</default>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>E</identifier>
          <literal> </literal>
          <default>0</default>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>F</identifier>
          <literal> </literal>
          <default>"not_a_number"</default>
        </simple>
      </class>
    </task>
    

    Remaining Loose Ends

    I've ignored this bit:

    // TODO FIXME
    //void apply(Node parent, Default const& d) const {
        //if (d != "") {
            //auto def_ = named_child(parent, "default");
            //named_child(def_, "num").text().set(d.c_str());
        //}
    //}
    

    I did so, because it opens an entirely new can of worms. The "arbitrary" appearance of a num XML element tells me that your language has a type system with ditto literals. And that the parser expression

    default_ = -(lit("Default") >> alnum)
    

    was likely woefully underspecified. Do yourself a favor and don't settle for sloppy a solution. Instead, just model what you know to be true and enjoy a life without unnecessary complexity and hard-to-maintain code. E.g.:

    using Number = double;
    using Value  = boost::variant<Literal, Number, Id>;
    
    struct Simple : Base {
        std::optional<Enum>     enumeration;
        std::optional<Datatype> datatype;
        std::optional<Value>    def;
    };
    

    With ditto

    value_     = literal_ | number_ | id_;
    number_    = double_;
    default_   = lit("Default") >> value_;
    

    And some tweaks to XML generator:

    Live On Compiler Explorer

    // #define BOOST_SPIRIT_DEBUG 1
    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    #include <optional>
    namespace qi = boost::spirit::qi;
    
    namespace Ast {
        using boost::recursive_wrapper;
    
        template <typename> struct flavour_hack : std::char_traits<char> {};
        template <typename Tag>
            using String = std::basic_string<char, flavour_hack<Tag> >;
    
        using Id       = String<struct TagId>;
        using Literal  = String<struct TagLiteral>;
        using Datatype = String<struct TagDatatype>;
        struct Base {
            Id      id;
            Literal literal;
        };
    
        using Enum = std::vector<Id>;
    
        using Number = double;
        using Value  = boost::variant<Literal, Number, Id>;
    
        struct Simple : Base {
            std::optional<Enum>     enumeration;
            std::optional<Datatype> datatype;
            std::optional<Value>    default_;
        };
    
        struct Complex;
        struct Container;
    
        using Class = boost::variant<   
            Simple,                     
            recursive_wrapper<Complex>, 
            recursive_wrapper<Container>
        >;
    
        using Classes = std::vector<Class>;
        struct Container : Base { Class   element; };
        struct Complex   : Base { Classes members; };
    
        using Task = std::vector<Class>;
    } // namespace Ast
    
    BOOST_FUSION_ADAPT_STRUCT(Ast::Simple,    id, literal, enumeration, datatype, default_);
    BOOST_FUSION_ADAPT_STRUCT(Ast::Complex,   id, literal, members)
    BOOST_FUSION_ADAPT_STRUCT(Ast::Container, id, literal, element)
    
    namespace Parser {
        template <typename It> struct Task : qi::grammar<It, Ast::Task()> {
            Task() : Task::base_type(start) {
                using namespace qi;
    
                start = skip(space)[task_];
    
                // lexemes:
                id_      = raw[alpha >> *('_' | alnum)];
                literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';
    
                auto optlit = copy(literal_ | attr(std::string(" "))); // weird, but okay
    
                task_      = *class_ > eoi;
                type_      = simple_ | complex_ | container_;
                class_     = lit("Class") > type_ > ';';
                simple_    = lit("Simple") >> id_ >> optlit >> -enum_ >> -datatype_ >> -default_;
                complex_   = lit("Complex") >> id_ >> optlit >> '(' >> *type_ >> ')';
                container_ = lit("Container") >> id_ >> optlit >> '(' >> type_ > ')';
                enum_      = lit("enumeration") >> '(' >> -(id_ % ',') > ')';
                datatype_  = lit("datatype") >> id_;
                value_     = literal_ | number_ | id_;
                number_    = double_;
                default_   = lit("Default") >> value_;
    
                BOOST_SPIRIT_DEBUG_NODES(
                    (task_)(class_)(type_)(simple_)(complex_)(container_)(enum_)(datatype_)
                    (default_)(id_)(literal_)(value_)(number_)
                )
            }
    
        private:
            qi::rule<It, Ast::Task()> start;
    
            using Skipper = qi::space_type;
            qi::rule<It, Ast::Task(), Skipper>      task_;
            qi::rule<It, Ast::Class(), Skipper>     class_, type_;
            qi::rule<It, Ast::Simple(), Skipper>    simple_;
            qi::rule<It, Ast::Complex(), Skipper>   complex_;
            qi::rule<It, Ast::Container(), Skipper> container_;
            qi::rule<It, Ast::Enum(), Skipper>      enum_;
            qi::rule<It, Ast::Datatype(), Skipper>  datatype_;
            qi::rule<It, Ast::Value(), Skipper>     default_;
    
            // lexemes:
            qi::rule<It, Ast::Id()>      id_;
            qi::rule<It, Ast::Literal()> literal_;
            qi::rule<It, Ast::Value()>   value_;
            qi::rule<It, Ast::Number()>  number_;
        };
    }
    
    #include <pugixml.hpp>
    namespace Generate {
        using namespace Ast;
    
        struct XML {
            using Node = pugi::xml_node;
    
            // callable for variant visiting:
            template <typename T> void operator()(Node parent, T const& node) const { apply(parent, node); }
    
        private:
            template <typename... Ts>
            void apply(Node parent, boost::variant<Ts...> const& v) const {
                using std::placeholders::_1;
                boost::apply_visitor(std::bind(*this, parent, _1), v);
            }
    
            void apply(Node parent, Ast::Number const& num) const {
                named_child(parent, "num").text().set(num);
            }
    
            void apply(Node parent, Ast::Id const& id) const {
                named_child(parent, "identifier").text().set(id.c_str());
            }
    
            void apply(Node parent, Ast::Literal const& literal) const {
                named_child(parent, "literal").text().set(literal.c_str());
            }
    
            void apply(Node parent, Ast::Datatype const& datatype) const {
                named_child(parent, "datatype").text().set(datatype.c_str());
            }
    
            template <typename T> void apply(Node parent, std::optional<T> const& opt) const {
                if (opt)
                    apply(parent, *opt);
            }
    
            void apply(Node parent, Simple const& s) const {
                auto simple = named_child(parent, "simple");
                apply(simple, s.id);
                apply(simple, s.literal);
                apply(simple, s.enumeration);
                apply(simple, s.datatype);
                if (s.default_.has_value()) {
                    apply(named_child(simple, "default"), *s.default_);
                }
            }
    
            void apply(Node parent, Enum const& e) const {
                auto enum_ = named_child(parent, "enumeration");
                for (auto& v : e)
                    named_child(enum_, "word").text().set(v.c_str());
            }
    
            void apply(Node parent, Complex const& c) const {
                auto complex_ = named_child(parent, "complex");
                apply(complex_, c.id);
                apply(complex_, c.literal);
                for (auto& m : c.members)
                    apply(complex_, m);
            }
    
            void apply(Node parent, Container const& c) const {
                auto cont = named_child(parent, "container");
                apply(cont, c.id);
                apply(cont, c.literal);
                apply(cont, c.element);
            }
    
            void apply(Node parent, Task const& t) const {
                auto task = named_child(parent, "task");
                for (auto& c : t)
                    apply(task.append_child("class"), c);
            }
    
        private:
            Node named_child(Node parent, std::string const& name) const {
                auto child = parent.append_child();
                child.set_name(name.c_str());
                return child;
            }
        };
    } // namespace Generate
    
    int main() { 
        using It = std::string_view::const_iterator;
        static const Parser::Task<It> p;
        static const Generate::XML to_xml;
    
        for (std::string_view input :
        {
            R"(Class Simple A;
        Class Simple B "Column Name";
        Class Simple C datatype restriction;
        Class Simple D enumeration() Default NaN;
        Class Simple E Default -Inf;
        Class Simple F Default 42;
        Class Simple G Default 7.8e-10;
        Class Simple H Default "not a number";
        Class Simple I Default SomeIdentifier_3;
            )"
        }) //
        {
            try {
                Ast::Task t;
    
                if (qi::parse(begin(input), end(input), p, t)) {
                    pugi::xml_document doc;
                    to_xml(doc.root(), t);
                    doc.print(std::cout, "  ", pugi::format_default);
                    std::cout << std::endl;
                } else {
                    std::cout << " -> INVALID" << std::endl;
                }
            } catch (qi::expectation_failure<It> const& ef) {
                auto f    = begin(input);
                auto p    = ef.first - input.begin();
                auto bol  = input.find_last_of("\r\n", p) + 1;
                auto line = std::count(f, f + bol, '\n') + 1;
                auto eol  = input.find_first_of("\r\n", p);
    
                std::cerr << " -> EXPECTED " << ef.what_ << " in line:" << line << "\n"
                    << input.substr(bol, eol - bol) << "\n"
                    << std::setw(p - bol) << ""
                    << "^--- here" << std::endl;
            }
        }
    }
    

    Printing

    <task>
      <class>
        <simple>
          <identifier>A</identifier>
          <literal> </literal>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>B</identifier>
          <literal>Column Name</literal>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>C</identifier>
          <literal> </literal>
          <datatype>restriction</datatype>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>D</identifier>
          <literal> </literal>
          <enumeration />
          <default>
            <num>nan</num>
          </default>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>E</identifier>
          <literal> </literal>
          <default>
            <num>-inf</num>
          </default>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>F</identifier>
          <literal> </literal>
          <default>
            <num>42</num>
          </default>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>G</identifier>
          <literal> </literal>
          <default>
            <num>7.7999999999999999e-10</num>
          </default>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>H</identifier>
          <literal> </literal>
          <default>
            <literal>not a number</literal>
          </default>
        </simple>
      </class>
      <class>
        <simple>
          <identifier>I</identifier>
          <literal> </literal>
          <default>
            <identifier>SomeIdentifier_3</identifier>
          </default>
        </simple>
      </class>
    </task>