Goal: The health actuators of our microservices include ping tests to external services. Naturally, these ping requests sometimes time out or fail for other reasons, which leads to a DOWN response by the /health actuator. We would like SBA Server to ignore 1-2 of those DOWN requests and keep the status for the service in UP, until let's say 3 or more DOWN responses were received.
I studied all the available configuration options for SBA Server but to be honest, I am confused how they all play together, and how I can achieve this goal. I naively thought the default-retries
setting would accomplish this, but it didn't. I assume (hope) the desired behavior can be achieved by properly concerting multiple of those values, as it doesn't seem there is a single one that would achieve this goal.
For reference, these are the configuration options that I think may apply here:
Any thoughts?
My suggestion would be to adapt your actuator healthcheck, which does ping the external services, to cache the last ping responses and only return DOWN after a few consecutive timeouts.
This way spring boot admin stays simple and the logic for this special case is kept in the place were it actually belongs