I'm having an issue where when great expectations builds a query string to a table_asset
it doesn't use the schema name.
import great_expectations as gx
from sqlalchemy_extras.sqlalchemy_utils import get_credentials, get_connection_string
# this is a set of calls to our teams functions, don't worry too much about it
# the connection string will look like: 'redshift+psycopg2://USER:PASS@HOST:PORT/DB_NAME'
def get_gx_datasource(gx_context, db_name):
settings = get_credentials().get(db_name)
redshift_connection_string = str(get_connection_string(settings))
return gx_context.sources.add_sql(connection_string=redshift_connection_string, name=db_name)
gx_context = gx.get_context()
expectation_suite = gx_context.add_expectation_suite(expectation_suite_name='my_suite')
gx_datasource = get_gx_datasource(gx_context, db_name='db_name)
gx_datasource.add_table_asset(
name='bar',
table_name='bar',
schema_name='foo'
)
asset = gx_datasource.get_asset('bar')
asset.add_splitter_mod_integer(column_name='my_col', mod=10)
batch_request = asset.build_batch_request()
batches = gx_datasource.get_batch_list_from_batch_request(batch_request)
for batch in batches:
print(batch.batch_spec)
The error I get is something like:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "bar" does not exist
[SQL: SELECT distinct(mod(CAST(my_col AS INTEGER), %(mod_1)s)) AS distinct_1
FROM bar]
[parameters: {'mod_1': 10}]
But while that query doesn't work when testing against my connection to redshift, the query does work if I change it to add the schema name like foo.bar
.
But nothing I do seems to work.
Not this:
gx_datasource.add_table_asset(
name='bar',
table_name='foo.bar',
schema_name='foo'
)
Or this:
gx_datasource.add_table_asset(
name='bar',
table_name='foo.bar'
)
And not directly editing the data in the table_asset
object itself.
Am I missing something here?
It may not be what you are looking for but a Redshift session has a “search path” setting that can be used to set which schemas to search when one isn’t provided in the query. See: docs.aws.amazon.com/redshift/latest/dg/r_search_path.html
You can alter a user to set their default search_path such as:
ALTER USER bill SET search_path TO ‘$user’, public, foo;
but you will need to be a superuser to perform this.