kubernetesgoogle-kubernetes-engineprometheusgoogle-cloud-stackdriverhpa

HPA unable to find "untyped" custom metric from Prometheus / Stackdriver adapter


We are using HPA with custom metrics from a Java application in GKE. (More on that in this previous question.) We would like to use a untyped metric, though.

What we have done

So far, we did this:

  1. Enabled Managed Prometheus
  2. Installed JMX Exporter on JVMs and configured it to export the desired metric
  3. Deployed the Stackdriver Adapter for Custom Metrics

Then we created some HPA responding to the total memory used by the JVM heap. It worked, pods got scaled up and down:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-autoscale
  namespace: somenamespace
spec:
  maxReplicas: 3
  metrics:
  - pods:
      metric:
        name: prometheus.googleapis.com|jvm_memory_bytes_used|gauge
        selector:
           matchLabels:
             metric.labels.area: heap
      target:
        averageValue: 2G
        type: AverageValue
  type: Pods
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-java-app

What we want to do

Now, we would like to make it respond to the java_lang_threading_threadcount metric. According to its description, it is untyped:

# HELP java_lang_threading_threadcount java.lang:name=null,type=Threading,attribute=ThreadCount
# TYPE java_lang_threading_threadcount untyped
java_lang_threading_threadcount 123.0

How we proceeded

Following the Stackdriver adatper's documentation instructions, I go to Metrics Explorer and get the full name of the metric:

Screenshot showing that in Metrics Explorer the full name of the metrics is prometheus.googleapis.com/java_lang_threading_threadcount/unknown

So, the the fully qualified metric name is prometheus.googleapis.com/java_lang_threading_threadcount/unknown.

I add it to the HPA spec and apply it:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-autoscale
  namespace: somenamespace
spec:
  maxReplicas: 3
  metrics:
  - pods:
      metric:
        name: prometheus.googleapis.com|java_lang_threading_threadcount|unknown
        selector:
           matchLabels:
             metric.labels.area: heap
      target:
        averageValue: 200
        type: AverageValue
    type: Pods
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-java-app

What we've got

It didn't work, however: I've got this message:

$ k get hpa -o yaml my-autoscale  | grep unable -A7
    message: 'the HPA was unable to compute the replica count: unable to get metric
      prometheus.googleapis.com|java_lang_threading_threadcount|unknown: unable to
      fetch metrics from custom metrics API: googleapi: Error 404: Cannot find metric(s)
      that match type = "prometheus.googleapis.com/java_lang_threading_threadcount/unknown"
      label = area label = pod. If a metric was created recently, it could take up
      to 10 minutes to become available. Please try again soon., notFound'
    reason: FailedGetPodsMetric
    status: "False"
    type: ScalingActive

The question

What should be the name of the metric, then, if not prometheus.googleapis.com/java_lang_threading_threadcount/unknown?


Solution

  • The name of the metric is actually right. The problem is that we forgot in the spec the metric labels we used with the previous metric!

      - pods:
          metric:
            name: prometheus.googleapis.com|java_lang_threading_threadcount|unknown
            # selector:                       # This part is wrong
            #    matchLabels:                 # Once we removed it
            #      metric.labels.area: heap   # everything worked
          target:
    

    By removing the metric label, we got the HPA to work.