I'm implementing a simple app that changes column names (and leaves table names alone) in SQL statements. The statement is passed as a String
, and modified one is also returned as a String
, there's no database connection involved.
To achieve this, I'm using Apache Calcite's SQL parser. I parse the SQL string to the SqlNode
, accept a SqlVisitor
that creates renamed SqlNode
, and then write everything back to String
(using SqlNode.toSqlString()
).
The problem is that I don't know how to tell the difference between columns and tables in parsed SqlNode
object while accepting a SqlVisitor
. Both are represented as SqlIdentifier
, having the same SqlKind
. Hence, when SqlVisitor
is visiting the SqlIdentifier
, it will rename it whether it's a column or a table.
private String changeNames(String str) throws SqlParseException {
SqlShuttle visitor = new SqlShuttle() {
private String rename(String str) {
return str + "-test";
}
@Override
public SqlNode visit(SqlIdentifier identifier) {
SqlIdentifier output = new SqlIdentifier(rename(identifier.getSimple()), identifier.getCollation(), identifier.getParserPosition());
return output;
}
};
SqlParser.ConfigBuilder configBuilder = SqlParser.configBuilder();
configBuilder.setLex(Lex.MYSQL);
SqlParser.Config config = configBuilder.build();
SqlParser parser = SqlParser.create(str, config);
SqlNode parsedStatement = parser.parseQuery(str);
SqlNode outputNode = parsedStatement.accept(visitor);
return outputNode.toSqlString(SqlDialect.DUMMY).getSql();
}
for example
SELECT name, address, age FROM mytablename WHERE age = 23 AND name = 'John'
will be modified into
SELECT `name-test`, `address-test`, `age-test` FROM `mytablename-test` WHERE `age-test` = 23 AND `name-test` = 'John'
How could I tell if given SqlIdentifier
is a column or a table ?
To resolve identifiers to tables and columns, and figure out their types, you will need to use Calcite's validator (SqlValidator
). The validator understands SQL name resolution rules (e.g. whether an aliases in a FROM clause can be seen in a sub-query) whereas we intentionally did not make the parser, and the SqlNode
data structure it produces, aware of such things.
The two key concepts in the validator are scopes (SqlValidatorScope
) and namespaces (SqlValidatorNamespace
).
A scope is where you are standing and trying to resolve an identifier. For example, you might be in the SELECT clause of a query. Or in the WHERE clause of a particular sub-query. You will be able to see different collections of tables and columns in different scopes. Even a GROUP BY clause and ORDER BY clause have different scopes.
A namespace is something that looks like a table, and has a list of columns. It might be a table or, say, a sub-query in the FROM clause. If you are in a scope, you can look up a table alias, get a namespace, then look at what columns it has.
For your purposes, it would be useful if there was a variant of SqlShuttle
that knew exactly which scope you are in, and where you could ask identifiers to be expanded into table and column references. Unfortunately no one has build such a thing yet.