c++boostboost-spiritboost-phoenixboost-spirit-lex

Converting a Boost Spirit Lex semantic action to Phoenix - How to access _val?


I wrote a semantic action for my Boost Spirit Lexer to convert escape sequences in strings to what they stand for. It works perfectly and I want to convert it to a Boost Phoenix expression, but can't get that one to compile.

Here is what works:

// the semantic action
struct ConvertEscapes
{
    template <typename ItT, typename IdT, typename CtxT>
    void operator () (ItT& start, ItT& end, lex::pass_flags& matched, IdT& id, CtxT& ctx)
    {
        static boost::wregex escapeRgx(L"(\\\\r)|(\\\\n)|(\\\\t)|(\\\\\\\\)|(\\\\\")");
        static std::wstring escapeRepl = L"(?1\r)(?2\n)(?3\t)(?4\\\\)(?5\")";
        static std::wstring wval; // static b/c set_value doesn't seem to copy

        auto const& val = ctx.get_value();
        wval.assign(val.begin(), val.end());
        wval = boost::regex_replace(wval, 
                                    escapeRgx, 
                                    escapeRepl, 
                                    boost::match_default | boost::format_all);
        ctx.set_value(wval);
    }
};

// the token declaration
lex::token_def<std::wstring, wchar_t> literal_str;

// the token definition
literal_str  = L"\\\"([^\\\\\"]|(\\\\.))*\\\""; // string with escapes

// adding it to the lexer
this->self += literal_str [ ConvertEscapes() ];

This is what I tried to convert it:

this->self += literal_str 
[ 
    lex::_val = boost::regex_replace(lex::_val /* this is the place I can't figure out */,
                                     boost::wregex(L"(\\\\r)|(\\\\n)|
                                     (\\\\t)|(\\\\\\\\)|(\\\\\")"), 
                                     L"(?1\r)(?2\n)(?3\t)(?4\\\\)(?5\")", 
                                     boost::match_default | boost::format_all) 
];

A wstring can't be constructed from _val. _val also doesn't have begin() or end(), how is it supposed to be used anyway?

This std::wstring(lex::_start, lex::_end) fails, too, because those arguments aren't recognized as iterators.

In this question, I found phoenix::construct<std::wstring>(lex::_start, lex::_end), but this also doesn't really result in a wstring.

How do I get either a string or a pair of wchar_t iterators for the current token?


Solution

  • I'm going to chant the oft-heard "Why"?

    This time, for good reason.

    In general, avoid semantic actions: Boost Spirit: "Semantic actions are evil"?.

    Phoenix Actors are needlessly more complex than the dedicated functor. They have a sweet point (mainly simple assignment or builtin operations). But if the actor is any kind of non-trivial you'll see the complexity ramp up quickly, not just for the human but also for the compiler. This leads to

    Interestingly: Spirit X3 dropped Phoenix altogether, even though Phoenix was once the brain child of Spirit³.

    The new style uses c++14 polymorphic lambdas, that look 90% like the helper function object in the original code, but inline as a lambda.

    This specific case

    Can't work. At all.

    The problem is that you're mixing lazy/deferred actors with direct invocations. That can never work. The type of phoenix::construct<std::wstring>(lex::_start, lex::_end) isn't supposed to be std::wstring. Of course. It is supposed to be a lazy actor¹ that can be used at some later time to create a std::wstring.

    Now that we know that (and why) phoenix::construct<std::wstring>(lex::_start, lex::_end) is an actor type, it should become clear why it is completely bogus to call boost::regex_replace on it. You might as well say

    struct implementation_defined {} bogus;
    boost::regex_replace(bogus, re, fmt, boost::match_default | boost::format_all);
    

    And wonder why it would not compile.

    Summary:

    You should probably just have the dedicated functor. You can of course Phoenix-adapt the regex functions you require, but all it does is shift the complexity tax for some syntactic sugar.

    I'd always opt for the more naive approach that is going to be more understandable to a seasoned c++ programmer, and avoids pitfalls that come with high-wire acts².

    Nevertheless, here's a pointer should you be curious:

    http://www.boost.org/doc/libs/1_63_0/libs/phoenix/doc/html/phoenix/modules/function.html

    Live On Coliru

    #include <iostream>
    #include <boost/regex.hpp>
    #include <boost/phoenix.hpp>
    #include <boost/spirit/include/lex_lexer.hpp>
    #include <boost/spirit/include/lex_lexertl.hpp>
    #include <boost/spirit/include/lex.hpp>
    
    namespace lex = boost::spirit::lex;
    
    BOOST_PHOENIX_ADAPT_FUNCTION(std::wstring, regex_replace_, boost::regex_replace, 4)
    
    template <typename... T>
    struct Lexer : lex::lexer<T...> {
        Lexer() {
            // the token definition
            literal_str  = L"\\\"([^\\\\\"]|(\\\\.))*\\\""; // string with escapes
    
            // adding it to the lexer
            this->self += literal_str [
                lex::_val = regex_replace_(lex::_val,
                     boost::wregex(L"(\\\\r)|(\\\\n)|(\\\\t)|(\\\\\\\\)|(\\\\\")"), 
                     L"(?1\r)(?2\n)(?3\t)(?4\\\\)(?5\")", 
                     boost::match_default | boost::format_all) 
    
            ];
        }
    
        // the token declaration
        lex::token_def<std::wstring, wchar_t> literal_str;
    };
    
    int main() {
        typedef lex::lexertl::token<std::wstring::const_iterator, boost::mpl::vector<std::wstring, wchar_t>> token_type;
        typedef Lexer<lex::lexertl::actor_lexer<token_type>> lexer_type;
        typedef lexer_type::iterator_type lexer_iterator_type;
    }
    

    ¹ think composed function object that can be invoked at a later time

    ² the balance might tip if you were designing this as an EDSL for further configuration by non-experts, but then you will have the added responsibility of documenting your EDSL and the constraints in which it can be used

    ³ should we say, spirit-child of a brain?