c++benchmarkingboost-spiritboost-spirit-qiboost-spirit-lex

How to benchmark Boost Spirit Parser?


I'm working on a compiler and I would like to improve its performances. I found that about 50% of the time is spent parsing the source files. As the source file are quite small and I do quite a lot of transformations after that, it seems to me that it is perfectible.

My parser is a Boost Spirit parser with a lexer (with lexer::pos_iterator) and I have a middle sized grammar. I'm parsing the source into an AST.

My problem is that I have no idea what takes the most time during the parsing: copies of AST nodes, lexer, parser rules or memory.

I don't think that it is I/O problem since I'm working on a SSD and that I'm reading the file entirely at the beginning and then using only the memory version.

I tried using profilers, but the methods that takes time are some methods from Boost with names hundreds of characters long and I don't know exactly what they do...

So, is there a preferred way to benchmark a Boost Spirit Parser and its grammar ? Or is there some rules that can be used to verify the efficiency at some specific points ?

Thanks

Sources for those interested:


Solution

  • I have given things a quick scan.

    My profiler quickly told me that constructing the grammar and (especially) the lexer object took quite some resources.

    Indeed, just changing a single line in SpiritParser.cpp saved 40% of execution time1 (~28s down to ~17s):

        lexer::Lexer lexer;
    

    into

        static const lexer::Lexer lexer;
    

    Now,

    Anyways, implementing these further changes2 shaved off another ~15% of execution time:

    static const lexer::Lexer lexer;
    static const parser::EddiGrammar grammar(lexer); 
    
    try {
        bool r = spirit::lex::tokenize_and_parse(
                position_begin, position_end,
                lexer, 
                grammar(boost::phoenix::cref(position_begin)),
                program);
    

    Loose ideas:

    Hope this helps.


    [1] When parsing all test cases in test/cases/*.eddi 100 times like so (github):

    for (int i=0; i<100; i++)
        for (auto& fname : argv)
    {
        eddic::ast::SourceFile program;
        std::cout << fname << ": " << std::boolalpha << parser.parse(fname, program, nullptr) << "\n";
    }
    

    Timed with a simple

    time ./test ../../test/cases/*.eddi | md5sum
    

    With the md5sum acting as a sanity check.

    [2] I created a pull request with the proof-of-concept refactorings here https://github.com/wichtounet/eddic/pull/52