pythonamazon-redshiftgreat-expectations

Great Expectations using schema name in query for Redshift


I'm having an issue where when great expectations builds a query string to a table_asset it doesn't use the schema name.

import great_expectations as gx
from sqlalchemy_extras.sqlalchemy_utils import get_credentials, get_connection_string

# this is a set of calls to our teams functions, don't worry too much about it
# the connection string will look like: 'redshift+psycopg2://USER:PASS@HOST:PORT/DB_NAME'
def get_gx_datasource(gx_context, db_name):
    settings = get_credentials().get(db_name)
    redshift_connection_string = str(get_connection_string(settings))
    return gx_context.sources.add_sql(connection_string=redshift_connection_string, name=db_name)


gx_context = gx.get_context()
expectation_suite = gx_context.add_expectation_suite(expectation_suite_name='my_suite')
gx_datasource = get_gx_datasource(gx_context, db_name='db_name)
gx_datasource.add_table_asset(
    name='bar',
    table_name='bar',
    schema_name='foo'
)

asset = gx_datasource.get_asset('bar')
asset.add_splitter_mod_integer(column_name='my_col', mod=10)
batch_request = asset.build_batch_request()
batches = gx_datasource.get_batch_list_from_batch_request(batch_request)

for batch in batches:
    print(batch.batch_spec)

The error I get is something like:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "bar" does not exist

[SQL: SELECT distinct(mod(CAST(my_col AS INTEGER), %(mod_1)s)) AS distinct_1 
FROM bar]
[parameters: {'mod_1': 10}]

But while that query doesn't work when testing against my connection to redshift, the query does work if I change it to add the schema name like foo.bar.

But nothing I do seems to work.

Not this:

gx_datasource.add_table_asset(
    name='bar',
    table_name='foo.bar',
    schema_name='foo'
)

Or this:

gx_datasource.add_table_asset(
    name='bar',
    table_name='foo.bar'
)

And not directly editing the data in the table_asset object itself.

Am I missing something here?


Solution

  • It may not be what you are looking for but a Redshift session has a “search path” setting that can be used to set which schemas to search when one isn’t provided in the query. See: docs.aws.amazon.com/redshift/latest/dg/r_search_path.html

    You can alter a user to set their default search_path such as:

    ALTER USER bill SET search_path TO ‘$user’, public, foo;

    but you will need to be a superuser to perform this.