google-cloud-platformstackdrivergoogle-cloud-monitoringmonitoring-query-language

Can AppEngine response count metric be grouped on response class?


I'm trying to create an alert on CGP/stackdriver using the http/server/response_count metric for app engine. This metric has an response_code field that I can group_by:

fetch gae_app::appengine.googleapis.com/http/server/response_count
| filter metric.response_code>=500 && metric.response_code<600
| every 10m
| group_by [metric.response_code], sum(val())

But say I want to merge all 500+ responses under a 5xx class of response and then aggregate to a single count for the range, is it possible to pre-process so the group_by in the above example yields a single time series eg 5xx? I notice that one of the load balancer metrics has a "response_code_class" of this kind, but this is NOT available for this metric.

After that I'm looking for a ratio of 5xx requests to all requests, would that even be possible with this metric?


Solution

  • Below is a query that does the following:

    fetch gae_app
    | metric 'appengine.googleapis.com/http/server/response_count'
    | group_by [], sliding(10m), [
        countAll: sum(response_count), 
        count5xx: sum(if(response_code>=500 && response_code < 600, response_count, 0))]
    | value (count5xx / countAll)
    | every 1m
    

    Screenshot of the chart produced by a similar query:

    Screenshot of the chart produced by a similar query

    The output of the above query is a ratio of 5xx responses to all responses.

    Note: if you wanted to compute these ratios, for example, by zone, simply add zone to the first argument of group_by like this: group_by [zone], sliding(10m), [countAll: ..., count5xx: ...] | value (count5xx / countAll)