google-cloud-spanner

Should we retry a query if spanner returns ResourceExhausted error?


Recently I have seen the following error in my logs using the method ReadWriteTransaction from the spanner library:

*status.Error: rpc error: code = ResourceExhausted desc = Failing fast as CPU overload detected on server

I am using the Go official client library from Google.

I have searched in the Google Cloud Go github page and found the following pull request: https://github.com/googleapis/google-cloud-go/pull/9739/files

Should we check for that error and retry the operation in Spanner?

Searching information in Google and found this pull request in Google Cloud Github page: https://github.com/googleapis/google-cloud-go/pull/9739/files


Solution

  • TLDR: Yes

    RESOURCE_EXHAUSTED errors should be retried automatically by the client if you use the default client configuration. See https://github.com/googleapis/google-cloud-go/pull/11450 for a test that verifies this.

    Retries for this error are handled at two layers by the Go client:

    1. For (streaming) reads and queries, it is handled by this retryer
    2. For unary RPCs, like ExecuteSql or BatchCreateSessions, it is automatically handled by the gRPC libraries. ResourceExhausted errors are marked as retryable in this configuration file.

    Are you seeing the ResourceExhausted being propagated to your application? If so, that might happen if you use a custom timeout/retry configuration for your client, and that configuration does not include ResourceExhausted as a retryable error code.

    If you are only seeing it in logs that automatically detect error status codes being returned by the RPCs that are being executed, and not in your application, then that is (probably) an indication that the client is already retrying these errors for you.