javasqlparsingapache-calcite

apache calcite distinguish column names from table name


I'm implementing a simple app that changes column names (and leaves table names alone) in SQL statements. The statement is passed as a String, and modified one is also returned as a String, there's no database connection involved.

To achieve this, I'm using Apache Calcite's SQL parser. I parse the SQL string to the SqlNode, accept a SqlVisitor that creates renamed SqlNode, and then write everything back to String (using SqlNode.toSqlString()).

The problem is that I don't know how to tell the difference between columns and tables in parsed SqlNode object while accepting a SqlVisitor. Both are represented as SqlIdentifier, having the same SqlKind. Hence, when SqlVisitor is visiting the SqlIdentifier, it will rename it whether it's a column or a table.

private String changeNames(String str) throws SqlParseException {
    SqlShuttle visitor = new SqlShuttle() {
        private String rename(String str) {
            return str + "-test";
        }

        @Override
        public SqlNode visit(SqlIdentifier identifier) {
            SqlIdentifier output = new SqlIdentifier(rename(identifier.getSimple()), identifier.getCollation(), identifier.getParserPosition());
            return output;
        }
    };

    SqlParser.ConfigBuilder configBuilder =  SqlParser.configBuilder();
    configBuilder.setLex(Lex.MYSQL);
    SqlParser.Config config = configBuilder.build();

    SqlParser parser = SqlParser.create(str, config);
    SqlNode parsedStatement = parser.parseQuery(str);
    SqlNode outputNode = parsedStatement.accept(visitor);

    return outputNode.toSqlString(SqlDialect.DUMMY).getSql();
}

for example

SELECT name, address, age FROM mytablename WHERE age = 23 AND name = 'John'

will be modified into

SELECT `name-test`, `address-test`, `age-test` FROM `mytablename-test` WHERE `age-test` = 23 AND `name-test` = 'John'

How could I tell if given SqlIdentifier is a column or a table ?


Solution

  • To resolve identifiers to tables and columns, and figure out their types, you will need to use Calcite's validator (SqlValidator). The validator understands SQL name resolution rules (e.g. whether an aliases in a FROM clause can be seen in a sub-query) whereas we intentionally did not make the parser, and the SqlNode data structure it produces, aware of such things.

    The two key concepts in the validator are scopes (SqlValidatorScope) and namespaces (SqlValidatorNamespace).

    A scope is where you are standing and trying to resolve an identifier. For example, you might be in the SELECT clause of a query. Or in the WHERE clause of a particular sub-query. You will be able to see different collections of tables and columns in different scopes. Even a GROUP BY clause and ORDER BY clause have different scopes.

    A namespace is something that looks like a table, and has a list of columns. It might be a table or, say, a sub-query in the FROM clause. If you are in a scope, you can look up a table alias, get a namespace, then look at what columns it has.

    For your purposes, it would be useful if there was a variant of SqlShuttle that knew exactly which scope you are in, and where you could ask identifiers to be expanded into table and column references. Unfortunately no one has build such a thing yet.