prometheushistogramquantilemicrometer

Prometheus histogram_quantile for request times


I have a clustered service with multiple instances and my REST clients is collecting request duration via Micrometer's summary (in my case called http_rest_call_time_bucket).

I was wondering how to display chart which would display percentile 0.9 for each request paths in my cluster and calculate percentile them. Is that even possible?

So far I have following query:

histogram_quantile(0.9, sum by(le, method, url) ( rate(http_rest_call_time_bucket{}[5m])))

However I am not 100% sure with correctness since my cluster doesn't provide currently much data to verify. Is that OK that I am dropping instance and focusing only on le, method, url ?


Solution

  • I was wondering how to display chart which would display percentile 0.9 for each request paths in my cluster and calculate percentile them. Is that even possible?

    Yes, it is. You need to aggregate by uri and calculate the percentiles. Also, you can create a heatmap. Here is an example project that does both: https://github.com/jonatan-ivanov/teahouse

    Here you can find the dashboard that is doing this: tea-api.json and an example query could look like this (by application by uri):

    histogram_quantile(0.99, sum(rate(http_server_requests_seconds_bucket{application=~\"$application\", uri=~\"$uri\"}[$__rate_interval])) by (le))
    

    You can see it in action in this conference talk: https://www.youtube.com/watch?v=HQHuFnKvk_U&t=39m2s (I recommend watching the whole talk.)

    Please also make sure that your uri is templated and you don't use the raw url so that you will not run into high cardinality problems: https://develotters.com/posts/high-cardinality/