I have an S3 bucket which contains 1000s of folders which are basically table_names and those contains parquet files. I'm trying to create tables with that schema in redshift. I'm using redshift-data api. But getting below error when trying to create more than 500 tables.
Error Category: UNCLASSIFIED_ERROR; Failed Line Number: 93; ActiveStatementsExceededException: An error occurred (ActiveStatementsExceededException) when calling the ExecuteStatement operation: Active statements exceeded the allowed quota (500).
for every table create statement, I'm calling this code
redshift_data_client.execute_statement(
Database='dev',
Sql=statement,
WorkgroupName='rss-workgroup'
)
and I'm expecting those 1000s of tables to be created without issue.
When calling execute_statement()
, an asynchronous process is running the command. I suspect the problem is that the previous commands are still running when you keep submitting more commands.
Your program can call describe_statement()
to retrieve the current status of the commands previously sent. The response will include 'Status': 'SUBMITTED'|'PICKED'|'STARTED'|'FINISHED'|'ABORTED'|'FAILED'|'ALL'
to indicate whether the statement is finished. You should keep the number of Submitted/Picked/Started requests to under 500.
Alternatively, you can use batch_execute_statement()
which takes multiple statements that are run serially. From batch_execute_statement - Boto3 documentation:
The SQL statements are run as a single transaction. They run serially in the order of the array. Subsequent SQL statements don't start until the previous statement in the array completes. If any SQL statement fails, then because they are run as one transaction, all work is rolled back.
This way, you won't need to check the status of each previously-submitted request before submitting new requests.