Here is my issue. I experiment using boost::spirit::qi and am trying using placeholders like "_1" and "_a". I would like to access the underlying object "behind" a boost::qi/phoenix placeholder but I'm a bit struggling here.
Let's say I have the following class:
class Tag {
public:
Tag() = default; // Needed by qi
Tag(std::uint8_t _raw_tag) : m_raw_tag( _raw_tag ) {}
std::uint8_t get_size() { return m_raw_tag & 0b111; }
std::uint8_t get_type() { return m_raw_tag & 0b1000; }
private:
std::uint8_t m_raw_tag;
};
I have to parse frames starting with a tag byte that gives information about what I have to read next. To do this, I have written little helper class named Tag that unmasks these pieces of information like the type of the tag or size of the piece of data to come next. I always store the data in an std::uint32_t
but it is possible that the size of the data is 3 bytes and not something pre-defined like 1, 2 or 4 in which case I can respectively use qi::byte
or qi::big_word
or qi::big_qword
(assuming the big endianness). Therefore, I'm thinking about reading the data byte after byte and bit-shifting them in the output std::uint32_t
.
That would give such a parser in pseudo cpp code:
template<typename _Iterator>
struct Read_frame : qi::grammar<_Iterator, std::uint32_t(), qi::locals<std::uint8_t>> {
Read_frame() : Read_frame::base_type(data_parser)
{
using boost::spirit::qi::byte_;
using boost::spirit::qi::omit;
using boost::spirit::qi::repeat;
using boost::spirit::qi::_val;
using namespace qi::labels;
tag_parser %= byte_;
// we read what's in the tag but we don't store it
// Call the method get_size() of Tag is my issue, I don't know how to do it
data_parser %= omit[tag_parser[ _a = _1.get_size()]] >> eps[_val = 0]
>> repeat(_a)[ byte_[ _val += (_1 << (--_a * 8)) ];
}
qi::rule<_Iterator, std::uint32_t(), qi::locals<std::uint8_t>> data_parser;
qi::rule<_Iterator, Tag()> tag_parser;
};
The line:
data_parser %= omit[context_tag[ _a = _1.get_size()]] >> eps[_val = 0]
is where my problem lies. I don't know how to access method of Tag in a semantic actions. Thereby I thought about using boost::phoenix::static_cast_<Tag*>(&_1)->get_size()
or something alike but it does not work.
This is the first time I'm using the whole boost::spirit
thing along with boost::phoenix
and to be quite honest I don't think I really understood how the placeholders in boost
work nor the principle of boost::phoenix::static_cast_
. That's why I'm here gently asking for your help :). If you need more details, I will give them to you with pleasure
Thanks in advance,
A newbie with boost spirit
Semantic actions are lazy phoenix actors. That is, they are "deferred functions". You can also see them as dynamically defined composed functions.
The "value behind a placeholder" depends on the context. That context is runtime. The Phoenix transformation ("evaluation") uses that context to retrieve the actual object behind the placeholder during invocation.
The last part is the point: any runtime effect must be deferred to during invocation. That means that you need a Phoenix actor to access the get_size()
method and lazily invoke it.
Clumsy? You bet. The whole semantic-action eDSL is limited. Luckily, there are many ways to approach this:
you can use phoenix::bind
with a pointer-to-member function
you can use many predefined lazy functions for things like construction or most of STL (#include <boost/phoenix/stl.hpp>
).
Incidentally. phoenix::size
doesn't work for your type because it doesn't adhere to STL conventions (size_t T::size() const
instead of
get_size
).
You can write your own actors as polymorphic function objects, and adapt them either
phoenix::function<>
In fact my favorite take on this has become px::function f = [](auto& a, auto& b) { return a + b; };
, fully leveraging C++17 CTAD
Let's demonstrate all or most of these.
As mentioned in my comment, I'm a bit confused by the apparent behavior of the parser as given, so let's first pin it down using the phoenix::bind
approach as an example:
template <typename It> struct Read_frame : qi::grammar<It, uint32_t(), qi::locals<uint8_t>> {
Read_frame() : Read_frame::base_type(data_parser) {
using namespace qi::labels;
tag_parser = qi::byte_;
auto _size = px::bind(&Tag::get_size, _1);
constexpr qi::_a_type _len;
data_parser //
= tag_parser[(_len = _size, _val = 0)] //
>> qi::repeat(_len)[ //
qi::byte_[_val += (_1 << --_len)] //
];
}
qi::rule<It, uint32_t(), qi::locals<uint8_t>> data_parser;
qi::rule<It, Tag()> tag_parser;
};
Note several other simplifications/readability tricks. Now with some test cases Live On Compiler Explorer:
PASS [] -> none
PASS [0b00] -> optional(0)
PASS [0b01] -> none
PASS [0b01, 0b101010] -> optional(42)
PASS [0b10, 0b101010] -> none
PASS [0b10, 0b101010, 0b00] -> optional(84)
PASS [0b11, 0b101010, 0b00, 0b00] -> optional(168)
PASS [0b11111111] -> none
PASS [0b11111111, 0b01, 0b10, 0b11, 0b100, 0b101, 0b110, 0b111] -> optional(247)
Instead of the mutating of the qi::local, I'd simply incrementally shift:
data_parser //
= tag_parser[(_len = _size, _val = 0)] //
>> qi::repeat(_len)[ //
qi::byte_[(_val <<= 1, _val += _1)] //
];
We have the unit tests now to verify the behavior is the same: Live On Compiler Explorer.
As promised:
using phoenix::function
and C++17 lambda goodness: Live
px::function get_size = [](Tag const& tag) { return tag.get_size(); };
data_parser //
= tag_parser[(_len = get_size(_1), _val = 0)] //
>> qi::repeat(_len)[ //
qi::byte_[(_val <<= 1, _val += _1)] //
];
Note that the nature of deferred function objects is polymorphic, so this works just the same:
px::function get_size = [](auto& tag) { return tag.get_size(); };
using the same without C++17 goodness: Live
template <typename It> struct Read_frame : qi::grammar<It, uint32_t(), qi::locals<uint8_t>> {
Read_frame() : Read_frame::base_type(data_parser) {
using namespace qi::labels;
constexpr qi::_a_type _len;
tag_parser = qi::byte_;
data_parser //
= tag_parser[(_len = get_size(_1), _val = 0)] //
>> qi::repeat(_len)[ //
qi::byte_[(_val <<= 1, _val += _1)] //
];
}
private:
struct get_size_f {
auto operator()(Tag const& tag) const { return tag.get_size(); };
};
px::function<get_size_f> get_size{};
qi::rule<It, uint32_t(), qi::locals<uint8_t>> data_parser;
qi::rule<It, Tag()> tag_parser;
};
using adaptation macros (BOOST_PHOENIX_ADAPT_CALLABLE
), Live
namespace {
struct get_size_f {
auto operator()(Tag const& tag) const { return tag.get_size(); };
};
BOOST_PHOENIX_ADAPT_CALLABLE(get_size_, get_size_f, 1);
} // namespace
template <typename It> struct Read_frame : qi::grammar<It, uint32_t(), qi::locals<uint8_t>> {
Read_frame() : Read_frame::base_type(data_parser) {
using namespace qi::labels;
constexpr qi::_a_type _len;
tag_parser = qi::byte_;
data_parser //
= tag_parser[(_len = get_size_(_1), _val = 0)] //
>> qi::repeat(_len)[ //
qi::byte_[(_val <<= 1, _val += _1)] //
];
}
private:
qi::rule<It, uint32_t(), qi::locals<uint8_t>> data_parser;
qi::rule<It, Tag()> tag_parser;
};
Still using Qi, I would note that there is nothing in the Tag
that necessitates using that as an attribute type. In fact, we need only the trivial bit mask which might be a free function, if you really want. So, this minimal code does the same without much of the unneeded complexity:
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename It> struct Read_frame : qi::grammar<It, uint32_t(), qi::locals<uint8_t>> {
Read_frame() : Read_frame::base_type(start) {
using namespace qi::labels;
start //
= qi::byte_[(_val = 0, _a = _1 & 0b111)] //
>> qi::repeat(_a)[ //
qi::byte_[(_val <<= 1, _val += _1)] //
];
}
private:
qi::rule<It, uint32_t(), qi::locals<uint8_t>> start;
};
A free function would be just as easy: Live
start //
= qi::byte_[(_val = 0, _a = px::bind(size_from_tag, _1))] //
>> qi::repeat(_a)[ //
qi::byte_[(_val <<= 1, _val += _1)] //
];
In real life, I'd certainly code a custom parser. You can do so in Spirit Qi, but to go with the times, vastly reduce compile times and just generally make my life easier, I'd go with Spirit X3:
#include <boost/spirit/home/x3.hpp>
namespace Readers {
namespace x3 = boost::spirit::x3;
static constexpr uint8_t size_from_tag(uint8_t tag) { return tag & 0b111; }
struct frame_parser : x3::parser<frame_parser> {
using attribute_type = uint32_t;
bool parse(auto& first, auto last, auto&& /*ctx*/, auto&& /*rcontext*/, auto& attr) const {
if (first == last)
return false;
auto save = first;
uint8_t tag = *first++;
uint8_t len = size_from_tag(tag);
uint32_t val = 0;
while (len && first != last) {
--len;
val <<= 1;
val += static_cast<uint8_t>(*first++);
}
if (len == 0) {
attr = val;
return true;
}
first = save;
return false;
}
} static frame;
} // namespace Readers
#include <fmt/ranges.h>
#include <fmt/std.h>
int main() {
using Data = std::vector<uint8_t>;
struct {
Data input;
std::optional<uint32_t> expected;
} static const cases[]{
{{}, {}}, // empty input, expect nothing in return
{{0b0000}, 0},
{{0b0001}, {}}, // missing byte
{{0b0001, 42}, 42}, // 42
{{0b0010, 42}, {}}, // missing byte
{{0b0010, 42, 0}, 2 * 42}, // 2*42
{{0b0011, 42, 0, 0}, 4 * 42}, // 4*42
{{0xff}, {}}, // requires 7 bytes
{{0xff, 1, 2, 3, 4, 5, 6, 7}, 247}, // like this
};
for (auto& [data, expected] : cases) {
std::optional<uint32_t> actual;
auto ok = parse(begin(data), end(data), -Readers::frame, actual);
auto pass = (actual == expected);
auto verdict = pass ? "PASS" : "FAIL";
assert(ok); // optional parser should never fail, but we want to be sure
if (pass)
fmt::print("{} {::#04b} -> {}\n", verdict, data, actual);
else
fmt::print("{} {::#04b} -> {}\n\t *** expected: {}\n", verdict, data, actual, expected);
}
}
Note only does this compile 10x¹ faster, I suspect it will be way easier for the compiler to optimize. Indeed this program
constexpr uint32_t parse_frame(auto const& input) {
uint8_t v;
parse(begin(input), end(input), x3::expect[Readers::frame], v);
return v;
}
int main() {
return parse_frame(std::array<uint8_t, 3>{0b0010, 42, 0}); // 2*42
}
Optimizes all the way to
main:
mov eax, 84
ret
See it Live On Compiler Explorer including the generated assembly code
¹ proven by finger dipping