apache-calcitesql-parser

Calcite parse sql into parts for mult data source


In my case I'm querying data from multi data source(like csv+mysql) via a single sql. How can I distinguish the data source for tables and detect what columns are queried on tables by using Calcite? (Meta data of data source available)

Result that I required is something like:
- TableA(col1, col2, col3) -> Data source CSV
- TableB(col1, colx, coly) -> Data source Mysql

My case is something like what Apache Drill(uses Calcite) does, I tried read Drill source code but I cannot find the way how Drill decides the relations.

String sql = "select c.c1, m.c2 from csv.tbl as c, mysql.schema.tbl as m where c.id = m.id”;

Frameworks.ConfigBuilder configBuilder = Frameworks.newConfigBuilder();
configBuilder.defaultSchema(`my SchemaPlus here`);
FrameworkConfig frameworkConfig = configBuilder.build();
Planner planner = Frameworks.getPlanner(frameworkConfig);

SqlNode sqlNode = planner.parse(sql);
planner.validate(sqlNode);
RelRoot relRoot = planner.rel(sqlNode);

This is what now I have, but it seems nothing I wanted there ~_~|||

thannks a lot.


Solution

  • If your questions is whether Calcite can automatically decipher what columns you're using if you don't put that information in your SQL query, it can't. It will assume you're using your default schema and try to map it there. If you're using multiple schemas, it's stupid (not in the bad way) and you have to tell it what to do. You have to write your SQL query so that it contains that information, just like you did.

    If you want to extract that information, you have to do it using RelVisitor, like I did in my master thesis. You can find the code here and the related issue here