I'm trying to make a few microservices more resilient and retrying certain types of HTTP requests would help with that.
Retrying timeouts will give clients a terribly slow experience, so I don't intend to retry in this case. Retrying 400s doesn't help because a bad request will remain a bad request a few milliseconds later.
I imagine there are other reasons to not retry a few other types of errors, but which errors and why?
There are some errors that should not be retried because they seem permanent:
So, most of the 4** Client errors should not be retried.
The 5** Servers errors that should not be retried:
However, in order to make the microservices more resilient you should use the Circuit breaker pattern and fail fast when the upstream is down.