boostboost-spiritboost-spirit-qi

Boost spirit grammar to skip php comments. Is this working code written with the current recommended boost parser?


I've done a function that strips all comments and a few other elements from php code. It's working fine, but, as I do not deeply undertand the code, I have some doubts:

EDIT: The goal of the function is to return any code in the php file that is neither a comment nor a (use|namespace) statement nor the tags <?php .. ?>

I am using templates to autogenerate php code, and once the code is created, I can add custom code. Previous to calling this function I manage to delete all the automatic code that was generated, and then this function tells me if I have added any custom code to the php file.

Thas is why I say it is working, as I dont mind the code, I just want to know if there is any custom code at all.

EDIT 2:

Example of input string:

<?php

namespace tests;
use codeception/tests;

/* This unit test tests something */

class Tester {

    /// @group debug
    public function testsFeature(/*AcceptanceTester*/ $I) {
        $I->assertTrue($this->testsAll());
    }
}
?>

And the required output:

classTester{publicfunctiontestsFeature($I){$I->assertTrue($this->testsAll());}}

In fact, the result has no any useful use, I just need to know if it is empty.

There are other approaches to solve the whole problem, like regenerating the template in a temp file and diff'ing it to get the addition changes, but that 1) would be far more expensive, 2) I really want to learn to use boost grammar parsers.


Solution

  • Oh I see, the whole thing was a bit inside-out. You are "parsing" the stuff that you want to "skip" and "skipping" the stuff you need "outside the parser".

    It seems a lot more straightforward to have a parser and skipper in their designated roles. Let's create a StripCommentsParser:

    using Iterator = std::string::const_iterator;
    struct StripCommentParser : qi::grammar<Iterator, std::string()> {
    

    This declares the output std::string which we will use to collect the desired output. I'd put all the bits together like so:

    struct StripCommentParser : qi::grammar<Iterator, std::string()> {
        StripCommentParser() : StripCommentParser::base_type(start) {
    
            using namespace qi;
            single_line_comment = "//" >> *(qi::char_ - eol) >> (eol | eoi);
            block_comment       = ("/*" >> *(block_comment | qi::char_ - "*/")) > ("*/" | eoi);
            php_tag             = lit("<?php") | lit("?>");
            php_comment         = '#' >> *(qi::char_ - eol) >> (eol | eoi);
            php_namespace       = lit("namespace ") >> *(qi::char_ - (eol | ';')) >> (eol | ';');
            php_use             = lit("use ") >> *(qi::char_ - (eol | ';')) >> (eol | ';');
    
            start = qi::skip(space | single_line_comment | block_comment | php_tag | php_namespace | php_use | php_comment)[*char_];
        }
    
      private:
        qi::rule<Iterator, std::string()> start;
        qi::rule<Iterator> block_comment, single_line_comment, php_tag, php_comment, php_namespace, php_use;
    };
    
    std::string non_comments_php_code(std::string const& contents) {
        std::string non_comments_code;
        parse(begin(contents), end(contents), StripCommentParser{}, non_comments_code);
        return non_comments_code;
    }
    

    Notes:

    Observations

    Questions

    Q. Am I using the latest technology to parse a grammar in boost? A few years ago I used only Spirit but I didn't use qi.

    That's interesting. "Years ago" is when I'd use Qi. Nowadays I still recommend Qi, but you have the option of using C++14 Spirit X3 (going C++17 now).

    If all you're doing is squeezing ignorable input then I'd say X3 is a better choice. However there are areas where I think X3 isn't as mature (e.g. attribute propagation/handling).

    Q. Is this the right approach with spirit?

    Yes and no. Yes in the sense that you create rules. No in the sense that you used the skipper as the grammar (and skipper too). And wrote your own parser around the skipper. I think the above example is what you want

    Q. What is the reason for putting the grammar inside a block of code?

    Scope. That's what blocks do. In this case it limits the scope of all the detail rules, as well as the using namespace directive. The struct has the same goal but packaging it up in a reusable instance.