I am trying to use AWS rds-data api to perform some heavy etl in aurora Serverless Postgresql.
As per AWS Rds DATA documentation, By default, a call times out if it's not finished processing within 45 seconds. However, you can continue running a SQL statement if the call times out by using the continueAfterTimeout parameter.
I see that boto3 rds-data supports the continueAfterTimeout parameter (a boolean). And I can use this flag in a transaction like this.
def execute_transaction_query(sql, transaction_id):
print(sql)
response = rds_client.execute_statement(
secretArn=AURORA_DB_CREDS_ARN,
resourceArn=AURORA_DB_ARN,
database=AURORA_DB_NAME,
sql=sql,
transactionId=transaction_id,
continueAfterTimeout=True, # boolean flag to continue after timeout in theory
)
However, the queries are still failling after 45 seconds with the error
An error occurred (StatementTimeoutException) when calling the ExecuteStatement operation: Request timed out
Ok so the reason rds-data calls fail is that the continueAfterTimeout=True
doesnt mean the boto3 call wont fail, just that the sql query running on the database will continue running.
So what you need to do when running rds-data etls is to execute statements in a try/catch block:
response = rds_client.execute_statement(
secretArn=AURORA_DB_CREDS_ARN,
resourceArn=AURORA_DB_ARN,
database=AURORA_DB_NAME,
sql=sql,
transactionId=transaction_id,
continueAfterTimeout=True,
)
except botocore.exceptions.ClientError as error:
# aurora fails automatically after 45 seconds but continues in the db
#https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html
if error.response['Error']['Code'] == 'StatementTimeoutException':
print('QUERY TIMEDOUT AFTER MAX 45 SECONDS. THIS IS FINE')
# arbitrary wait in case the commit transaction fails with timeout
time.sleep(60)
else:
raise Exception(error)