javamongodbantlr4mongo-javamongo-java-driver

Parse sql query using antlr parsetree to mongo bson document in Java


I have a SQL like query example:

Select id,name from employee where age > 30 and department = 'IT' limit 200

The SQL query grammer is defined in an ANTLR4 grammar file. Is there any implementation that converts the parse tree of this query to a bson document?

The bson document will then be used to query a mongo db.


Solution

  • In one of my previous jobs I did something similar: got a query (not an sql, but pretty similar) and translated it to mongo query with antlr.

    I don't have a code to share, However I can share my thoughts:

    1. Mongo is not SQL compliant, so you can't just take a sql grammar. What about JOINs and all the relational algebra? What about aggregations that are pretty tricky in mongo with their aggregation framework? In the opposite direction, how do you generate SQL that gets translated to "exists" clause in mongo. There are many things like this, some are small, some are huge, but bottom line you must be talking about some kind of subset of sql ,some DSL that is allowed to be used as a query language and looks "like" an sql because people are used to SQL.

    2. With that in mind, you should create your own grammar and Antlr will generate a lexer/parser for you. You'll also get for granted a syntax check of the query in Runtime. Antlr won't be able to parse the query if its not in a correct format obviously, some grammar rule will fail. This is an another reason to not take SQL "as is".

    3. So far so good, you've created your own listener / visitor. In my case I've opted for creating an object representation of the query with internal state and everything. So the query

    Select id,name 
    from employee 
    where age > 30 
     and department = 'IT' 
    limit 200
    

    Was translated to objects of type:

    
    class Query {
       private SelectClause select;
       private FromClause  from;
       private WhereClause where;
       private Limit        limit;
    }
    
    class SelectClause {
       private List<String> fields;
    }
    ...
    class WhereClause {
       Condition root;
    }
    
    interface Condition {
    ...
    }
    
    class AndCondition implements Condition { // the same for Not, Or
    
    }
    

    For this particular query its something like:

    Query q = new Query(new SelectClause(["id", "name"]), new FromClause("employee"), new WhereClause(new AndCondition(new SimpleLeafCondition("age", Operators.GT, 30), new  SimpleLeafCondition("department", Operators.EQ, "IT" )), new Limit(30));
    

    Then Its possible to make some optimizations in the query (like embedding of where clauses if you need, or, for example, manipulating the "For" part if you're working multi tenant environment and have different collections for different tenants).

    After all you can go with design pattern "interpreter" and recursively parse the query objects and "translate" them to valid mongo query. I remember that this step took me something like 1 day to accomplish (it was 7 years ago with mongo 2 I guess, but still), given the correct structure of objects representing the query, so this should not be that complicated. I'm bringing this up, because It looks like its your primary concern in the question.