c++boost-spiritboost-spirit-qi

Parsing to different types of values in boost::spirit and apply casting to negative numbers


I am trying to solve an issue with positive and negative values in Boost Spirit.

The parser should use unsigned numbers (positive) 99% of the time.

The program works reading a string that defines a variables from 1 to 32 bits that should be read from another stream (for question context, not shown in the example), but there is a special case where a string "D_REF" may be a 16 bits signed number (2's complement).

The program codifies all checks as unsigned values in a std::vector, so I need to codify that positive value as unsigned, but previously I need to apply a cast to it to force it into an unsigned short value, and then store it in the unsigned int struct.

This need comes from an after request where a data stream shall be read and values extracted from it as unsigned, and there parsed comparisons apply to them.

I know this request may look weird, but it is a must for a current project, so can anyone help me with this?

Godbolt link: https://godbolt.org/z/8j615Mecx

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>

namespace engine
{
    struct Check
    {
        std::string variable;
        unsigned int number;
    };

    using Checks = std::vector<Check>;
}

BOOST_FUSION_ADAPT_STRUCT(engine::Check, variable, number)

namespace engine
{
    namespace qi = boost::spirit::qi;

    template <typename It>
    class Parser : public qi::grammar<It, Checks()>
    {
    private:
        qi::rule<It, Check(), qi::blank_type> equal1, equal2;
        qi::rule<It, Checks()> start;

    public:
        Parser() : Parser::base_type(start)
        {
            using namespace qi;

            //equal1 = as_string["MSG33.D_REF"] >> "==" >> int_[static_cast<unsigned short>(_1)];// This is the idea...
            equal1 = as_string["MSG33.D_REF"] >> "==" >> int_;// This may contain negative numbers, but they are only 16 bits length, so they must be casted to "unsigned short" and not to "unsigned int"
            equal2 = +(alnum | char_("._")) >> "==" >> uint_;

            start = skip(blank)[(equal1 | equal2) % "&&"] > eoi;
        }
    };

    Checks parse(const std::string& str)
    {
        using It = std::string::const_iterator;
        static const Parser<It> parser;

        Checks checks;

        It first = str.begin(), last = str.end();
        if (!qi::parse(first, last, parser, checks))
            return {};

        return checks;
    }
}

int main()
{
    auto checks1 = engine::parse("MSG33.ANYTHING == 25");// Normal case. All the checks are done with positive variable values
    auto checks2 = engine::parse("MSG33.D_REF == 25");// Especial case extended from normal case. Checks A positive/negative variable with a positiove value.
    auto checks3 = engine::parse("MSG33.D_REF == -25");// Especial case. Check a negative value. D_REF should be codified as 2's complement 16 bits unsigned, but it is converted to 32 bits unsigned
    
    std::cout << std::hex << "Obtained: " << checks3.front().number << std::endl << "Wished: " << static_cast<unsigned short>(checks3.front().number);// It displays 0xffffffe7, but I need 0xffe7. Possible semanatic action to force conversion prior to vector insertion???
}

Solution

  • First: A word of caution

    Automatic attribute propagation already does exactly what you need. That's pretty much what you'd expect since it compiles.

    Your problem really has nothing to do with the parsing at all. It has to do with how you interpret the correctly parsed negative number, correctly converted to the integer type you chose (unsigned int).

    Indeed, if you want to treat a unsigned int value as a short (signed or unsigned) you have to coerce it, or use a bitmask to clear the high bits: c.number & 0xffff.

    Storing 0xffe7 inside the unsigned int is of course possible. But it is technically just INCORRECT 2's complement encoding. Experience tells me it will lead to error-prone code.

    If I were to go for a design like this, I'd choose an integer representation type that is expressly NOT an arithmetic type. Something like

    struct Number {
         _implementation_defined_ storage;
    
         uint32_t as_uint32() const { return /*some implementation logic on storage*/; }
         int16_t as_int16() const { return /*some other implementation logic on storage*/; }
         // etc.
    };
    

    In the land of parsed AST's, I'd prefer

    template <typename V>
    struct Check {
        std::string name;
        V number;
    };
    using Check = boost::variant<Check<uint32_t>, Check<int16_t>>;
    

    With that out of the way, let's see some answers to your question:

    Using static cast in the semantic action

    You can force the issue using Boost Phoenix: Live On Coliru

    assign_d_ref %= qi::string("MSG33.D_REF") >> "==" >>
        qi::int_[_1 = boost::phoenix::static_cast_<uint16_t>(_1)];
    

    IMO, a slightly better approach¹ is to have a parser that parses uint16_t in the first place: Live On Coliru

    qi::int_parser<uint16_t> uint16_;
    assign_d_ref = qi::string("MSG33.D_REF") >> "==" >> uint16_;
    

    Other Improvements

    I'd also improve the expressiveness some more using e.g.:

    qi::symbols<char> s16_vars;
    s16_vars += "MSG33.D_REF", "MSG34.D_REF";
    assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
    

    To generalize for signed 16 bit variables.

    qi::rule<It, std::string()> name;
    name       = +(qi::alnum | qi::char_("._"));
    

    This fixes the missing lexeme[] around the name (by declaring the rule without skipper²).

    assign_u32 = name >> "==" >> qi::uint_;
    assign     = assign_s16 | assign_u32;
    start      = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
    

    Apart from the readability, it fixes the edge case where blanks are immediately before end-of-input.

    See the combined result Live On Coliru

    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    #include <iostream>
    
    namespace engine {
        struct Check {
            std::string  variable;
            uint32_t     number;
    
            friend std::ostream& operator<<(std::ostream& os, Check const& c) {
                auto f = os.flags();
                os << "{" << std::quoted(c.variable) << " == " //
                   << std::hex << std::showbase << c.number << "}";
                os.setf(f);
                return os;
            }
        };
    
        using Checks = std::vector<Check>;
    } // namespace engine
    
    BOOST_FUSION_ADAPT_STRUCT(engine::Check, variable, number)
    
    namespace engine {
        namespace qi = boost::spirit::qi;
    
        template <typename It> class Parser : public qi::grammar<It, Checks()> {
          public:
            Parser() : Parser::base_type(start) {
                using namespace qi::labels;
    
                s16_vars += "MSG33.D_REF", "MSG34.D_REF";
                name       = +(qi::alnum | qi::char_("._"));
                assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
                assign_u32 = name >> "==" >> qi::uint_;
                assign     = assign_s16 | assign_u32;
                start      = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
    
                BOOST_SPIRIT_DEBUG_NODES((start)(assign)(assign_u32)(assign_s16)(name))
            }
    
          private:
            qi::int_parser<uint16_t>              uint16_;
            qi::symbols<char>                     s16_vars;
            qi::rule<It, Check(), qi::blank_type> assign, assign_s16, assign_u32;
            qi::rule<It, Checks()>                start;
    
            // lexeme:
            qi::rule<It, std::string()> name;
        };
    
        Checks parse(const std::string& str) {
            using It = std::string::const_iterator;
            static const Parser<It> parser;
    
            Checks checks;
    
            It first = str.begin(), last = str.end();
            if (!qi::parse(first, last, parser, checks))
                return {};
    
            return checks;
        }
    } // namespace engine
    
    int main() {
        for (auto sep = ""; auto& c : engine::parse(
                 "MSG33.ANYTHING == 25 && MSG33.D_REF == 25 && MSG33.D_REF == -25"))
            std::cout << std::exchange(sep, " && ") << c;
        std::cout << "\n";
    }
    

    Printing (like all samples above):

    {"MSG33.ANYTHING" == 0x19} && {"MSG33.D_REF" == 0x19} && {"MSG33.D_REF" == 0xffe7}
    

    BONUS: Variant Style

    Because you might be interested, here's a version using the variant AST:

    Live On Coliru

    #include <boost/core/demangle.hpp>
    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    #include <iostream>
    
    namespace engine {
        template <typename T>
        struct VarCheck {
            std::string  variable;
            T            number;
    
            friend std::ostream& operator<<(std::ostream& os, VarCheck const& c) {
                auto f = os.flags();
                os << " {" << std::quoted(c.variable) << " == " << std::hex
                   << std::showbase << c.number << ":"
                   << boost::core::demangle(typeid(T).name()) << "}";
                os.setf(f);
                return os;
            }
        };
        using S16Var = VarCheck<int16_t>;
        using U32Var = VarCheck<uint32_t>;
        using Check  = boost::variant<U32Var, S16Var>;
    
        using Checks = std::vector<Check>;
    } // namespace engine
    
    // BOOST_FUSION_ADAPT_STRUCT(engine::S16Var, variable, number)
    // BOOST_FUSION_ADAPT_STRUCT(engine::S16Var, variable, number)
    // Or, generically: https://www.boost.org/doc/libs/1_80_0/libs/fusion/doc/html/fusion/adapted/adapt_tpl_struct.html
    BOOST_FUSION_ADAPT_TPL_STRUCT((T), (engine::VarCheck)(T), variable, number)
    
    namespace engine {
        namespace qi = boost::spirit::qi;
    
        template <typename It> class Parser : public qi::grammar<It, Checks()> {
          public:
            Parser() : Parser::base_type(start) {
                using namespace qi::labels;
    
                s16_vars += "MSG33.D_REF", "MSG34.D_REF";
                name       = +(qi::alnum | qi::char_("._"));
                assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
                assign_u32 = name >> "==" >> qi::uint_;
                assign     = assign_s16 | assign_u32;
                start      = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
    
                BOOST_SPIRIT_DEBUG_NODES((start)(assign)(assign_u32)(assign_s16)(name))
            }
    
          private:
            qi::int_parser<uint16_t>               uint16_;
            qi::symbols<char>                      s16_vars;
            qi::rule<It, Check(), qi::blank_type>  assign;
            qi::rule<It, U32Var(), qi::blank_type> assign_u32;
            qi::rule<It, S16Var(), qi::blank_type> assign_s16;
            qi::rule<It, Checks()>                 start;
    
            // lexeme:
            qi::rule<It, std::string()> name;
        };
    
        Checks parse(const std::string& str) {
            using It = std::string::const_iterator;
            static const Parser<It> parser;
    
            Checks checks;
    
            It first = str.begin(), last = str.end();
            if (!qi::parse(first, last, parser, checks))
                return {};
    
            return checks;
        }
    } // namespace engine
    
    int main() {
        for (auto  sep = "";
             auto& c : engine::parse("MSG33.ANYTHING == 25 && MSG33.D_REF == 25 && "
                                     "MSG33.D_REF == -25")) {
            std::cout << std::exchange(sep, "\n && ") << c;
        }
        std::cout << "\n";
    }
    

    I've extended the output with the static type information for visibility:

     {"MSG33.ANYTHING" == 0x19:unsigned int}
     &&  {"MSG33.D_REF" == 0x19:short}
     &&  {"MSG33.D_REF" == 0xffe7:short}
    

    It's easy to generalize for more variable type here:

    using S16Var = VarCheck<int16_t>;
    using U32Var = VarCheck<uint32_t>;
    using DblVar = VarCheck<double>;
    using StrVar = VarCheck<std::string>;
    using Check  = boost::variant<U32Var, S16Var, DblVar, StrVar>;
    

    See it Live On Coliru, with the output

     {"MSG33.ANYTHING" == 0x19:unsigned int}
     &&  {"MSG33.D_REF" == 0x19:short}
     &&  {"SEHE.DBL_1" == 4.2e+10:double}
     &&  {"SEHE.DBL_2" == -inf:double}
     &&  {"SEHE.STR_42" == Life The Universe and everything:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}
     &&  {"SEHE.STR_300" == Three hundred:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}
     &&  {"MSG33.D_REF" == 0xffe7:short}
    

    ¹ E.g. Boost Spirit: "Semantic actions are evil"?

    ² Boost spirit skipper issues