I'm trying to create an alert on CGP/stackdriver using the http/server/response_count metric for app engine. This metric has an response_code field that I can group_by:
fetch gae_app::appengine.googleapis.com/http/server/response_count
| filter metric.response_code>=500 && metric.response_code<600
| every 10m
| group_by [metric.response_code], sum(val())
But say I want to merge all 500+ responses under a 5xx class of response and then aggregate to a single count for the range, is it possible to pre-process so the group_by in the above example yields a single time series eg 5xx? I notice that one of the load balancer metrics has a "response_code_class" of this kind, but this is NOT available for this metric.
After that I'm looking for a ratio of 5xx requests to all requests, would that even be possible with this metric?
Below is a query that does the following:
group_by
to count the 5xx responses in a 10-minute sliding window.group_by
, also count all responses in the same 10-minute sliding window.group_by
, simply compute the ratio of the two counts.fetch gae_app
| metric 'appengine.googleapis.com/http/server/response_count'
| group_by [], sliding(10m), [
countAll: sum(response_count),
count5xx: sum(if(response_code>=500 && response_code < 600, response_count, 0))]
| value (count5xx / countAll)
| every 1m
Screenshot of the chart produced by a similar query:
The output of the above query is a ratio of 5xx responses to all responses.
Note: if you wanted to compute these ratios, for example, by zone
, simply add zone
to the first argument of group_by
like this: group_by [zone], sliding(10m), [countAll: ..., count5xx: ...] | value (count5xx / countAll)