google-cloud-platformgoogle-cloud-dataflowgoogle-cloud-monitoring

DataFlow custom counters not sent to Cloud Monitoring despite valid names


I am using custom metrics within my DataFlow job (using Scala on Java11), I am using multiple distribution metrics to track the duration of API calls my pipeline makes to an external service.

lazy val CONCURRENCY: Counter = ScioMetrics.counter(NAMESPACE, "concurrency")
lazy val SUCCESS_DISTRIBUTION: Distribution = ScioMetrics.distribution(
    NAMESPACE, "success_distribution_ms")
lazy val TIMEOUT_DISTRIBUTION: Distribution = ScioMetrics.distribution(
    NAMESPACE, "timeout_distribution_ms")

And I can see that my pipeline is properly using these metrics as I can see it within the job's detail view:

Counter name Value Step
concurrency 0 Call API /vectorize/map:2
success_distribution_ms_COUNT 441 Call API /vectorize/map:2
success_distribution_ms_MAX 28,984 Call API /vectorize/map:2
success_distribution_ms_MEAN 4,047 Call API /vectorize/map:2
success_distribution_ms_MIN 131 Call API /vectorize/map:2
timeout_distribution_ms_COUNT 122 Call API /vectorize/map:2
timeout_distribution_ms_MAX 30,059 Call API /vectorize/map:2
timeout_distribution_ms_MEAN 30,028 Call API /vectorize/map:2
timeout_distribution_ms_MIN 30,001 Call API /vectorize/map:2

However, I only see the following metrics within stackdriver: concurrency, success_distribution_ms_MAX, success_distribution_ms_MEAN, success_distribution_ms_MIN, and I'm missing the rest (including success_distribution_ms_COUNT).

When reading logs in GCP Logs explorer with the following query:

protoPayload.serviceName="monitoring.googleapis.com"
protoPayload.methodName="google.monitoring.v3.MetricService.CreateMetricDescriptor"

I see only the 4 visible metrics being created:

Logs explorer

All the metrics names appear to be valid as per the documentation and the StackOverflow answer https://stackoverflow.com/a/48566003/507793, so I'm at a loss as for why these particular metrics aren't being propagated.

How do I get my metrics into stackdriver?


Solution

  • Turns out there's an "info" level log on my job indicating that I had too many metrics

    resource.type="dataflow_step"
    resource.labels.job_id="2023-08-16_07_24_17-6755882541878130557"
    logName="projects/threadloom-backend/logs/dataflow.googleapis.com%2Fjob-message"
    severity=INFO
    

    With the text content containing

    Your project already contains 100 Dataflow-created metric descriptors, so new user metrics of the form custom.googleapis.com/* will not be created. However, all user metrics are also available in the metric dataflow.googleapis.com/job/user_counter. If you rely on the custom metrics, you can delete old / unused metric descriptors. See https://developers.google.com/apis-explorer/#p/monitoring/v3/monitoring.projects.metricDescriptors.list and https://developers.google.com/apis-explorer/#p/monitoring/v3/monitoring.projects.metricDescriptors.delete

    After deleting some unused metrics, my new metrics showed up.