I am using custom metrics within my DataFlow job (using Scala on Java11), I am using multiple distribution metrics to track the duration of API calls my pipeline makes to an external service.
lazy val CONCURRENCY: Counter = ScioMetrics.counter(NAMESPACE, "concurrency")
lazy val SUCCESS_DISTRIBUTION: Distribution = ScioMetrics.distribution(
NAMESPACE, "success_distribution_ms")
lazy val TIMEOUT_DISTRIBUTION: Distribution = ScioMetrics.distribution(
NAMESPACE, "timeout_distribution_ms")
And I can see that my pipeline is properly using these metrics as I can see it within the job's detail view:
Counter name | Value | Step |
---|---|---|
concurrency | 0 | Call API /vectorize/map:2 |
success_distribution_ms_COUNT | 441 | Call API /vectorize/map:2 |
success_distribution_ms_MAX | 28,984 | Call API /vectorize/map:2 |
success_distribution_ms_MEAN | 4,047 | Call API /vectorize/map:2 |
success_distribution_ms_MIN | 131 | Call API /vectorize/map:2 |
timeout_distribution_ms_COUNT | 122 | Call API /vectorize/map:2 |
timeout_distribution_ms_MAX | 30,059 | Call API /vectorize/map:2 |
timeout_distribution_ms_MEAN | 30,028 | Call API /vectorize/map:2 |
timeout_distribution_ms_MIN | 30,001 | Call API /vectorize/map:2 |
However, I only see the following metrics within stackdriver: concurrency
, success_distribution_ms_MAX
, success_distribution_ms_MEAN
, success_distribution_ms_MIN
, and I'm missing the rest (including success_distribution_ms_COUNT
).
When reading logs in GCP Logs explorer with the following query:
protoPayload.serviceName="monitoring.googleapis.com"
protoPayload.methodName="google.monitoring.v3.MetricService.CreateMetricDescriptor"
I see only the 4 visible metrics being created:
All the metrics names appear to be valid as per the documentation and the StackOverflow answer https://stackoverflow.com/a/48566003/507793, so I'm at a loss as for why these particular metrics aren't being propagated.
How do I get my metrics into stackdriver?
Turns out there's an "info" level log on my job indicating that I had too many metrics
resource.type="dataflow_step"
resource.labels.job_id="2023-08-16_07_24_17-6755882541878130557"
logName="projects/threadloom-backend/logs/dataflow.googleapis.com%2Fjob-message"
severity=INFO
With the text content containing
Your project already contains 100 Dataflow-created metric descriptors, so new user metrics of the form custom.googleapis.com/* will not be created. However, all user metrics are also available in the metric dataflow.googleapis.com/job/user_counter. If you rely on the custom metrics, you can delete old / unused metric descriptors. See https://developers.google.com/apis-explorer/#p/monitoring/v3/monitoring.projects.metricDescriptors.list and https://developers.google.com/apis-explorer/#p/monitoring/v3/monitoring.projects.metricDescriptors.delete
After deleting some unused metrics, my new metrics showed up.