google-cloud-platformterraformgoogle-cloud-stackdriver

How to create an alert policy for unknown custom metric in GCP


Given the following alert policy in GCP (created with terraform)

resource "google_monitoring_alert_policy" "latency_alert_policy" {
  display_name = "Latency of 95th percentile more than 1 second"
  combiner     = "OR"
  conditions {
    display_name = "Latency of 95th percentile more than 1 second"
    condition_threshold {
      filter     = "metric.type=\"custom.googleapis.com/http/server/requests/p95\" resource.type=\"k8s_pod\""
      threshold_value = 1000
      duration   = "60s"
      comparison = "COMPARISON_GT"
      aggregations {
        alignment_period = "60s"
        per_series_aligner= "ALIGN_NEXT_OLDER"
        cross_series_reducer= "REDUCE_MAX"
        group_by_fields      = [
          "metric.label.\"uri\"",
          "metric.label.\"method\"",
          "metric.label.\"status\"",
          "metadata.user_labels.\"app.kubernetes.io/name\"",
          "metadata.user_labels.\"app.kubernetes.io/component\""
        ]
      }
      trigger {
        count = 1
        percent = 0
      }
    }
  }
}

I get the following this error (which is part of a terraform project also creating the cluster):

Error creating AlertPolicy: googleapi: Error 404: The metric referenced by the provided filter is unknown. Check the metric name and labels.

Now, this is a custom metric (by a Spring Boot app with Micrometer), therefore this metric does not exist when creating infrastructure. Does GCP have to know a metric before creating an alert for it? This would mean that a Spring boot app has to be deployed on a cluster and sending metrics before this policy can be created?

Am I missing something... (like this should not be done in terraform, infrastructure)?


Solution

  • interesting question, the reason for the 404 error is because the resource was not found, there seems to be a preexisting pre-requisite for the descriptor. I would create the metric descriptor first, you can use this as reference, then going forward on creating the alerting policy.

    This is an ingenious way you may avoid it. Please comment if it makes sense and if you make it work like this, share it.