c++boostboost-spiritboost-regexboost-spirit-x3

How to use u8_to_u32_iterator in Boost Spirit X3?


I am using Boost Spirit X3 to create a programming language, but when I try to support Unicode, I get an error!
Here is an example of a simplified version of that program.

#define BOOST_SPIRIT_X3_UNICODE
#include <boost/spirit/home/x3.hpp>

namespace x3 = boost::spirit::x3;

struct sample : x3::symbols<unsigned> {
    sample()
    {
        add("48", 10);
    }
};

int main()
{
  const std::string s("🌸");

  boost::u8_to_u32_iterator<std::string::const_iterator> first{cbegin(s)},
    last{cend(s)};

  x3::parse(first, last, sample{});
}

Live on wandbox

What should I do?


Solution

  • As you noticed, internally char_encoding::unicode employs char32_t.

    So, first changing the symbols accordingly:

    template <typename T>
    using symbols = x3::symbols_parser<boost::spirit::char_encoding::unicode, T>;
    
    struct sample : symbols<unsigned> {
        sample() { add(U"48", 10); }
    };
    

    Now the code fails calling into case_compare:

    /home/sehe/custom/boost_1_78_0/boost/spirit/home/x3/string/detail/tst.hpp|74 col 33| error: no match for call to ‘(boost::spirit::x3::case_compare<boost::spirit::char_encoding::unicode>) (reference, char32_t&)’
    

    As you can see it expects a char32_t reference, but u8_to_u32_iterator returns unsigned ints (std::uint32_t).

    Just for comparison / sanity check: https://godbolt.org/z/1zozxq96W

    Luckily you can instruct the u8_to_u32_iterator to use another co-domain type:

    Live On Compiler Explorer

    #define BOOST_SPIRIT_X3_UNICODE
    #include <boost/spirit/home/x3.hpp>
    #include <iomanip>
    #include <iostream>
    
    namespace x3 = boost::spirit::x3;
    
    template <typename T>
    using symbols = x3::symbols_parser<boost::spirit::char_encoding::unicode, T>;
    
    struct sample : symbols<unsigned> {
        sample() { add(U"48", 10)(U"🌸", 11); }
    };
    
    int main() {
        auto test = [](auto const& s) {
            boost::u8_to_u32_iterator<decltype(cbegin(s)), char32_t> first{
                cbegin(s)},
                last{cend(s)};
    
            unsigned parsed_value;
            if (x3::parse(first, last, sample{}, parsed_value)) {
                std::cout << s << " -> " << parsed_value << "\n";
            } else {
                std::cout << s << " FAIL\n";
            }
        };
    
        for (std::string s : {"🌸", "48", "🤷"})
            test(s);
    }
    

    Prints

    🌸 -> 11
    48 -> 10
    🤷 FAIL