c++boostboost-spirit-qiboost-phoenix

Misuse of boost::phoenix::static_cast_ to get object behind placeholder


Here is my issue. I experiment using boost::spirit::qi and am trying using placeholders like "_1" and "_a". I would like to access the underlying object "behind" a boost::qi/phoenix placeholder but I'm a bit struggling here.

Let's say I have the following class:

class Tag {
public:
  Tag() = default; // Needed by qi
  Tag(std::uint8_t _raw_tag) : m_raw_tag( _raw_tag ) {}

  std::uint8_t get_size() { return m_raw_tag & 0b111; }
  std::uint8_t get_type() { return m_raw_tag & 0b1000; }

private:
  std::uint8_t m_raw_tag;
};

I have to parse frames starting with a tag byte that gives information about what I have to read next. To do this, I have written little helper class named Tag that unmasks these pieces of information like the type of the tag or size of the piece of data to come next. I always store the data in an std::uint32_t but it is possible that the size of the data is 3 bytes and not something pre-defined like 1, 2 or 4 in which case I can respectively use qi::byte or qi::big_word or qi::big_qword (assuming the big endianness). Therefore, I'm thinking about reading the data byte after byte and bit-shifting them in the output std::uint32_t. That would give such a parser in pseudo cpp code:

template<typename _Iterator>
struct Read_frame : qi::grammar<_Iterator, std::uint32_t(), qi::locals<std::uint8_t>> {
  Read_frame() : Read_frame::base_type(data_parser)
  {
    using boost::spirit::qi::byte_;
    using boost::spirit::qi::omit;
    using boost::spirit::qi::repeat;
    using boost::spirit::qi::_val;
    using namespace qi::labels;
    tag_parser %= byte_;
    // we read what's in the tag but we don't store it
    // Call the method get_size() of Tag is my issue, I don't know how to do it
    data_parser %= omit[tag_parser[ _a = _1.get_size()]] >> eps[_val = 0] 
      >> repeat(_a)[ byte_[ _val += (_1 << (--_a * 8)) ];
  }

  qi::rule<_Iterator, std::uint32_t(), qi::locals<std::uint8_t>> data_parser;
  qi::rule<_Iterator, Tag()> tag_parser;
};

The line:

data_parser %= omit[context_tag[ _a = _1.get_size()]] >> eps[_val = 0]

is where my problem lies. I don't know how to access method of Tag in a semantic actions. Thereby I thought about using boost::phoenix::static_cast_<Tag*>(&_1)->get_size() or something alike but it does not work.
This is the first time I'm using the whole boost::spirit thing along with boost::phoenix and to be quite honest I don't think I really understood how the placeholders in boost work nor the principle of boost::phoenix::static_cast_. That's why I'm here gently asking for your help :). If you need more details, I will give them to you with pleasure

Thanks in advance,

A newbie with boost spirit


Solution

  • Semantic actions are lazy phoenix actors. That is, they are "deferred functions". You can also see them as dynamically defined composed functions.

    The "value behind a placeholder" depends on the context. That context is runtime. The Phoenix transformation ("evaluation") uses that context to retrieve the actual object behind the placeholder during invocation.

    The last part is the point: any runtime effect must be deferred to during invocation. That means that you need a Phoenix actor to access the get_size() method and lazily invoke it.

    Clumsy? You bet. The whole semantic-action eDSL is limited. Luckily, there are many ways to approach this:

    Let's demonstrate all or most of these.

    Step #1 Pinning Down Behaviour

    As mentioned in my comment, I'm a bit confused by the apparent behavior of the parser as given, so let's first pin it down using the phoenix::bind approach as an example:

    template <typename It> struct Read_frame : qi::grammar<It, uint32_t(), qi::locals<uint8_t>> {
        Read_frame() : Read_frame::base_type(data_parser) {
            using namespace qi::labels;
    
            tag_parser = qi::byte_;
    
            auto _size = px::bind(&Tag::get_size, _1);
            constexpr qi::_a_type _len;
    
            data_parser                                  //
                = tag_parser[(_len = _size, _val = 0)]   //
                >> qi::repeat(_len)[                     //
                       qi::byte_[_val += (_1 << --_len)] //
            ];
        }
    
        qi::rule<It, uint32_t(), qi::locals<uint8_t>> data_parser;
        qi::rule<It, Tag()> tag_parser;
    };
    

    Note several other simplifications/readability tricks. Now with some test cases Live On Compiler Explorer:

    PASS [] -> none
    PASS [0b00] -> optional(0)
    PASS [0b01] -> none
    PASS [0b01, 0b101010] -> optional(42)
    PASS [0b10, 0b101010] -> none
    PASS [0b10, 0b101010, 0b00] -> optional(84)
    PASS [0b11, 0b101010, 0b00, 0b00] -> optional(168)
    PASS [0b11111111] -> none
    PASS [0b11111111, 0b01, 0b10, 0b11, 0b100, 0b101, 0b110, 0b111] -> optional(247)
    

    Step #2: Simplify

    Instead of the mutating of the qi::local, I'd simply incrementally shift:

        data_parser                                    //
            = tag_parser[(_len = _size, _val = 0)]     //
            >> qi::repeat(_len)[                       //
                   qi::byte_[(_val <<= 1, _val += _1)] //
        ];
    

    We have the unit tests now to verify the behavior is the same: Live On Compiler Explorer.

    Step #3 Other Bind Approaches

    As promised:

    BONUS Simplify #2

    Still using Qi, I would note that there is nothing in the Tag that necessitates using that as an attribute type. In fact, we need only the trivial bit mask which might be a free function, if you really want. So, this minimal code does the same without much of the unneeded complexity:

    Live On Compiler Explorer

    #include <boost/phoenix.hpp>
    #include <boost/spirit/include/qi.hpp>
    namespace qi = boost::spirit::qi;
    
    template <typename It> struct Read_frame : qi::grammar<It, uint32_t(), qi::locals<uint8_t>> {
        Read_frame() : Read_frame::base_type(start) {
            using namespace qi::labels;
            start                                          //
                = qi::byte_[(_val = 0, _a = _1 & 0b111)]   //
                >> qi::repeat(_a)[                         //
                       qi::byte_[(_val <<= 1, _val += _1)] //
            ];
        }
    
      private:
        qi::rule<It, uint32_t(), qi::locals<uint8_t>> start;
    };
    

    A free function would be just as easy: Live

    start                                                         //
        = qi::byte_[(_val = 0, _a = px::bind(size_from_tag, _1))] //
        >> qi::repeat(_a)[                                        //
               qi::byte_[(_val <<= 1, _val += _1)]                //
    ];
    

    BONUS Simplify And Modernize

    In real life, I'd certainly code a custom parser. You can do so in Spirit Qi, but to go with the times, vastly reduce compile times and just generally make my life easier, I'd go with Spirit X3:

    Live On Compiler Explorer

    #include <boost/spirit/home/x3.hpp>
    
    namespace Readers {
        namespace x3 = boost::spirit::x3;
    
        static constexpr uint8_t size_from_tag(uint8_t tag) { return tag & 0b111; }
    
        struct frame_parser : x3::parser<frame_parser> {
            using attribute_type = uint32_t;
            bool parse(auto& first, auto last, auto&& /*ctx*/, auto&& /*rcontext*/, auto& attr) const {
                if (first == last)
                    return false;
                auto    save = first;
                uint8_t tag  = *first++;
                uint8_t len  = size_from_tag(tag);
    
                uint32_t val = 0;
                while (len && first != last) {
                    --len;
                    val <<= 1;
                    val += static_cast<uint8_t>(*first++);
                }
    
                if (len == 0) {
                    attr = val;
                    return true;
                }
                first = save;
                return false;
            }
        } static frame;
    } // namespace Readers
    
    #include <fmt/ranges.h>
    #include <fmt/std.h>
    int main() {
        using Data = std::vector<uint8_t>;
    
        struct {
            Data                    input;
            std::optional<uint32_t> expected;
        } static const cases[]{
            {{}, {}}, // empty input, expect nothing in return
            {{0b0000}, 0},
            {{0b0001}, {}},                     // missing byte
            {{0b0001, 42}, 42},                 // 42
            {{0b0010, 42}, {}},                 // missing byte
            {{0b0010, 42, 0}, 2 * 42},          // 2*42
            {{0b0011, 42, 0, 0}, 4 * 42},       // 4*42
            {{0xff}, {}},                       // requires 7 bytes
            {{0xff, 1, 2, 3, 4, 5, 6, 7}, 247}, // like this
        };
    
        for (auto& [data, expected] : cases) {
            std::optional<uint32_t> actual;
    
            auto ok      = parse(begin(data), end(data), -Readers::frame, actual);
            auto pass    = (actual == expected);
            auto verdict = pass ? "PASS" : "FAIL";
            assert(ok); // optional parser should never fail, but we want to be sure
            if (pass)
                fmt::print("{} {::#04b} -> {}\n", verdict, data, actual);
            else
                fmt::print("{} {::#04b} -> {}\n\t *** expected: {}\n", verdict, data, actual, expected);
        }
    }
    

    Note only does this compile 10x¹ faster, I suspect it will be way easier for the compiler to optimize. Indeed this program

    constexpr uint32_t parse_frame(auto const& input) {
        uint8_t v;
        parse(begin(input), end(input), x3::expect[Readers::frame], v);
        return v;
    }
    
    int main() {
        return parse_frame(std::array<uint8_t, 3>{0b0010, 42, 0}); // 2*42
    }
    

    Optimizes all the way to

    main:
            mov     eax, 84
            ret
    

    See it Live On Compiler Explorer including the generated assembly code


    ¹ proven by finger dipping