c++boost-spiritqi

Unable to parse SQL type where condition using boost::spirit::qi


I may be asking a very trivial question but am not getting blocks out of my brain to crack it. Trying to parse a SQL like where clause as shown below using boost::spirit::qi to generate a vector of pairs

std::string input = "book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'"

I have gone through the following threads but still unable to do it :-( Thread5 Thread4Thread3 Thread2 Thread1

[Thread1][6]
[Thread2][7]
[Thread3][8]
[Thread4][9]
[Thread5][10]

I genuinely request, kindly help me understand how to achieve this ... may be I had not completely given my 100% but please be kind ....

Here is the full code (some part commented which I wish to do), as a first step I was just checking if I can get all tokens in a Vector and then parse each Vector element to generate another vector of std::pair

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
#include <vector>

namespace qi    = boost::spirit::qi;
namespace phx   = boost::phoenix;

typedef std::string str_t;
typedef std::pair<str_t, str_t> pair_t;
typedef std::vector<pair_t> pairs_t;

typedef std::vector<str_t> strings_t;
//typedef std::map<std::string, std::string> map_t;
//typedef std::vector<map_t> maps_t;

template <typename It, typename Skipper = qi::space_type>
    //struct parser : qi::grammar<It, pairs_t(), Skipper>
    struct parser : qi::grammar<It, strings_t(), Skipper>
{
    parser() : parser::base_type(start)
    {
        using namespace qi;

        cond    = lexeme [ *(char_) ];
        conds   =  *(char_) >> cond % (lit("and"));

        //conds =  *(char_ - lit("and")) >>(cond % lit("and"));
        /*cond  = lexeme [ *(char_ - lit("and")) ];
        cond    = key >> "=" >> value;
        key     = *(char_ - "=");
        value   = ('\'' >> *(~char_('\'')) >> '\'');
        kv_pair = key >> value;*/
        start   = conds;
        //cond  = key >> "=" >> value;
        //key       = *(char_ - "=");
        //value = ('\'' >> *(~char_('\'')) >> '\'');
  //      kv_pair   = key >> value;
  //      start = kv_pair;
    }

  private:
    qi::rule<It, str_t(), Skipper> cond;
    qi::rule<It, strings_t(), Skipper> conds;
    //qi::rule<It, std::string(), Skipper> key, value;//, cond;
    //qi::rule<It, pair_t(), Skipper> kv_pair;
    //qi::rule<It, pairs_t(), Skipper> start;
    qi::rule<It, strings_t(), Skipper> start;
};

template <typename C, typename Skipper>
    bool doParse(const C& input, const Skipper& skipper)
{
    auto f(std::begin(input)), l(std::end(input));

    parser<decltype(f), Skipper> p;
    strings_t data;

    try
    {
        bool ok = qi::phrase_parse(f,l,p,skipper,data);
        if (ok)   
        {
            std::cout << "parse success\n";
            std::cout << "No Of Key-Value Pairs=  "<<data.size()<<"\n";
        }
        else    std::cerr << "parse failed: '" << std::string(f,l) << "'\n";
        return ok;
    } 
    catch(const qi::expectation_failure<decltype(f)>& e)
    {
        std::string frag(e.first, e.last);
        std::cerr << e.what() << "'" << frag << "'\n";
    }

    return false;
}

int main()
{
    std::cout<<"Pair Test \n";
    const std::string input = "book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'";
    bool ok = doParse(input, qi::space);
    std::cout<< input <<"\n";
    return ok? 0 : 255;
}

OUTPUT:

Pair Test
parse success
No Of Key-Value Pairs=  2
book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'

Which I expect 4 ... since there are 4 conditions !!

Thanks in Advance Regards, Vivek

some example to work out- live on coliru


Solution

  • I'm sorry to break it to you, but your grammar is far more broken than you imagined.

        conds   =  *(char_) // ...
    

    Right here, you're basically just parsing all the input into a single string, with whitespace skipped. In fact, adding

        for (auto& el : data)
            std::cout << "'" << el << "'\n";
    

    after parsing prints:

    Pair Test 
    parse success
    No Of Key-Value Pairs=  2
    'book.author_id='1234'andbook.isbn='xy99'andbook.type='abc'andbook.lang='Eng''
    ''
    

    As you can see, the first element is the string that *char_ parsed, and you get an empty element for free due to the fact that both conds and cond match on empty input.

    I would strongly suggest you to start simple. And I mean, much simpler.

    Slowly build your grammar up from the ground. Spirit is a very good tool to tackle with test-driven development (except for the compile times, but hey, you get more time to think!).

    Here's something that I just made up, starting thinking from the very first building block, the indentifier, and working my way up to the higher-level elements:

    // lexemes (no skipper)
    ident     = +char_("a-zA-Z.");
    op        = no_case [ lit("=") | "<>" | "LIKE" | "IS" ];
    nulllit   = no_case [ "NULL" ];
    and_      = no_case [ "AND" ];
    stringlit = "'" >> *~char_("'") >> "'";
    
    // other productions
    field     = ident;
    value     = stringlit | nulllit;
    condition = field >> op >> value;
    
    conjunction = condition % and_;
    start       = conjunction;
    

    These are close to the simplest thing that I suppose could parse your grammar (with a few creative notes left and right, where they don't seem too intrusive).

    UPDATE So this is where I got in 20 minutes:

    I always start out with mapping the types that I want the rules to expose:

    namespace ast
    {
        enum op { op_equal, op_inequal, op_like, op_is };
    
        struct null { };
    
        typedef boost::variant<null, std::string> value;
    
        struct condition
        {
            std::string _field;
            op _op;
            value _value;
        };
    
        typedef std::vector<condition> conditions;
    }
    

    Only condition cannot be "naturally" used in a Spirit grammar without adaptation:

    BOOST_FUSION_ADAPT_STRUCT(ast::condition, (std::string,_field)(ast::op,_op)(ast::value,_value))
    

    Now comes the grammar itself:

        // lexemes (no skipper)
        ident       = +char_("a-zA-Z._");
        op_token.add
            ("=",    ast::op_equal)
            ("<>",   ast::op_inequal)
            ("like", ast::op_like)
            ("is",   ast::op_is);
        op          = no_case [ op_token ];
        nulllit     = no_case [ "NULL" >> attr(ast::null()) ];
        and_        = no_case [ "AND" ];
        stringlit   = "'" >> *~char_("'") >> "'";
    
        //// other productions
        field       = ident;
        value       = stringlit | nulllit;
        condition   = field >> op >> value;
    
        whereclause = condition % and_;
        start       = whereclause;
    

    You can see minor deviations from my original sketch, that's interesting:

    See it all Live And Working On Coliru, output:

    Pair Test 
    parse success
    No Of Key-Value Pairs=  4
    ( [book.author_id] = 1234 )
    ( [book.isbn] LIKE xy99 )
    ( [book.type] = abc )
    ( [book.lang] IS NULL )
    
    book.author_id = '1234' and book.isbn liKE 'xy99' and book.type = 'abc' and book.lang IS null