amazon-dynamodbaws-sdk-go-v2

Why do I get ProvisionedThroughputExceededException in DynamoDB if AWS SDK for go implements exponential back-off?


I'm sorry that I can't post my code snippets. I have a Go script that scans through the DynamoDB database and makes modifications to the entries. Everything is done sequentially (no go routines are involved). However, when I was running this on a large database, I got a ProvisionedThroughputExceededException. I'm running the script locally.

I'm using aws-sdk-go-v2, which should have a 20-second exponential back-off implementation when this error is triggered. Since provisioned write capacities are on a per-second basis, shouldn't the SDK automatically make the script wait when the capacity is reached, until the next second when newer capacities are allocated? I'm using UpdateItem, PutItem, and DeleteItem operations.

One guess I have is that when I have many requests in a short amount of time, it actually consumes capacity in the future, when the database is busy processing requests made in the past. However, I got the exception after a few seconds of execution, which was way shorter than 20 seconds.

What's the proper way of handling this exception? Catching it, waiting a few seconds and retrying it feels a bit arbitrary. I don't understand why the SDK isn't taking care of this already.


Solution

  • The Go API (e.g., see https://github.com/aws/aws-sdk-go/blob/main/service/dynamodb/errors.go) claims that "The Amazon Web Services SDKs for DynamoDB automatically retry requests that receive this exception [ProvisionedThroughputExceededException]. Your request is eventually successful, unless your retry queue is too large to finish.". In your case, there is no parallelism and just one outstanding request at each time, so the retry queue only has one item. So with all of this, you are right and should not be seeing ProvisionedThroughputExceededException at all - or at least, not without a 20 second delay first.

    My only guess on why you're seeing is caused by the parameter DefaultMaxAttempts int = 3 . My guess (which I can't base on any code - I'm not familiar with this Go library) is that the code does not actually reach a full 20 second wait, and during three retry attempts it only covers much less than 20 seconds. If this is the case, can you please try increasing this "max attempts" parameter and seeing if it helps (at least to increase the retry period to the full 20 seconds)?