sqlalchemyodbcpyodbcclouderaimpala

Issue with SQLAlchemy accessing Impala database via cloudera ODBC DSN


I'm trying to access an Impala DB via SQLAlchemy - I have configured a DSN that allows me to connect to the DB when using directly pyodbc.

However when using SQLAlchemy I get an error:

When using a db called datamart_x in the DSN:

pyodbc.Error: ('HY000', '[HY000] [Cloudera][ImpalaODBC] (370) Query analysis error occurred during query execution: [HY000] : AnalysisException: datamart_x.schema_name() unknown for database datamart_x. Currently this db has 0 functions.\n (370) (SQLExecDirectW)')

sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('HY000', '[HY000] [Cloudera][ImpalaODBC] (370) Query analysis error occurred during query execution: [HY000] : AnalysisException: datamart_x.schema_name() unknown for database datamart_x. Currently this db has 0 functions.\n (370) (SQLExecDirectW)') [SQL: SELECT schema_name()]

It is not a permission issue - as I can connect directly to the DB with pyodbc using the same DSN.

I suspect the issue is in the SQL statement: SELECT schema_name() that is executed when the SQLAlchemy engine is access ed (e.g. in my case with pandas read_sql)

Any ideas if there are connection parameters to get this to work?

Below a the code that I use to create the SQLAlchemy engine

connection_string = 'mssql+pyodbc://DataLake'
SQL = 'SHOW DATABASES'
args = {'autocommit': True}
engine = create_engine(connection_string, connect_args=args)
df = pd.read_sql(SQL, engine)

Kind Regards, Ernst

Tried different kind of connection strings .

Update Enabled logging for the sqlalchemy module(below): This seems to confirm my suspicion - as the SELECT schema_name() Bu I still don't know how to prevent sqlalchemy from running this sql statement.

INFO:sqlalchemy.engine.Engine:SELECT CAST(SERVERPROPERTY('ProductVersion') AS VARCHAR)
INFO:sqlalchemy.engine.Engine:[raw sql] ()
INFO:sqlalchemy.engine.Engine:ROLLBACK
INFO:sqlalchemy.engine.Engine:SELECT schema_name()
INFO:sqlalchemy.engine.Engine:[generated in 0.00042s] ()
INFO:sqlalchemy.engine.Engine:ROLLBACK

Solution

  • You are trying to use mssql+pyodbc://, which is the dialect for Microsoft SQL Server. You'll need to use a dialect specifically for Impala. Usage examples can be found here:

    https://github.com/cloudera/impyla/blob/master/impala/tests/test_sqlalchemy.py