kubernetesgoogle-cloud-platformgoogle-kubernetes-enginestackdriverhpa

Error scaling up in HPA in GKE: apiserver was unable to write a JSON response: http2: stream closed


Following the guide that google made for deploying an HPA in Google Kubernetes Engine: https://cloud.google.com/kubernetes-engine/docs/tutorials/autoscaling-metrics

And adding the right permissions because I am using Workload Identity with this guide: https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/custom-metrics-stackdriver-adapter

And also adding the firewall-rule commented here: https://github.com/kubernetes-sigs/prometheus-adapter/issues/134

I am stuck in a point where the HPA returns me this error:

kubectl describe hpa -n test-namespace
Name:                  my-hpa
Namespace:             test-namespace
Labels:                <none>
Annotations:           <none>
CreationTimestamp:     Tue, 13 Apr 2021 12:47:56 +0200
Reference:             StatefulSet/my-set
Metrics:               ( current / target )
  "my-metric" on pods:  <unknown> / 1
Min replicas:          1
Max replicas:          60
StatefulSet pods:      1 current / 0 desired
Conditions:
  Type           Status  Reason               Message
  ----           ------  ------               -------
  AbleToScale    True    SucceededGetScale    the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetPodsMetric  the HPA was unable to compute the replica count: unable to get metric my-metric: no metrics returned from custom metrics API
Events:
  Type     Reason                        Age                   From                       Message
  ----     ------                        ----                  ----                       -------
  Warning  FailedGetPodsMetric           8m26s (x40 over 18m)  horizontal-pod-autoscaler  unable to get metric my-metric: no metrics returned from custom metrics API
  Warning  FailedComputeMetricsReplicas  3m26s (x53 over 18m)  horizontal-pod-autoscaler  failed to compute desired number of replicas based on listed metrics for StatefulSet/test-namespace/my-set: invalid metrics (1 invalid out of 1), first error is: failed to get pods metric value: unable to get metric my-metric: no metrics returned from custom metrics API

But the apiservices are in true,

kubectl get apiservices
NAME                                     SERVICE                                             AVAILABLE   AGE
...
v1beta1.custom.metrics.k8s.io            custom-metrics/custom-metrics-stackdriver-adapter   True        24h
v1beta1.external.metrics.k8s.io          custom-metrics/custom-metrics-stackdriver-adapter   True        24h
v1beta2.custom.metrics.k8s.io            custom-metrics/custom-metrics-stackdriver-adapter   True        24h
...

And when I try to retrieve the metric data it returns ok,

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/test-namespace/pods/*/my-metric" | jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta2",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta2/namespaces/test-namespace/pods/%2A/my-metric"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "test-namespace",
        "name": "my-metrics-api-XXXX",
        "apiVersion": "/__internal"
      },
      "metric": {
        "name": "my-metric",
        "selector": null
      },
      "timestamp": "2021-04-13T11:15:30Z",
      "value": "5"
    }
  ]
}

But the stackdriver gives me this error:

2021-04-13T11:01:30.432634Z apiserver was unable to write a JSON response: http2: stream closed
2021-04-13T11:01:30.432679Z apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}

I had to configure the adapter that google provides like this:

apiVersion: v1
kind: Namespace
metadata:
  name: custom-metrics
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: custom-metrics-stackdriver-adapter
  namespace: custom-metrics
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: custom-metrics-stackdriver-adapter
  namespace: custom-metrics
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: custom-metrics-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: custom-metrics-stackdriver-adapter
  namespace: custom-metrics
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics-resource-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
- kind: ServiceAccount
  name: custom-metrics-stackdriver-adapter
  namespace: custom-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: custom-metrics-stackdriver-adapter
  namespace: custom-metrics
  labels:
    run: custom-metrics-stackdriver-adapter
    k8s-app: custom-metrics-stackdriver-adapter
spec:
  replicas: 1
  selector:
    matchLabels:
      run: custom-metrics-stackdriver-adapter
      k8s-app: custom-metrics-stackdriver-adapter
  template:
    metadata:
      labels:
        run: custom-metrics-stackdriver-adapter
        k8s-app: custom-metrics-stackdriver-adapter
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccountName: custom-metrics-stackdriver-adapter
      containers:
      - image: gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.12.0-gke.0
        imagePullPolicy: Always
        name: pod-custom-metrics-stackdriver-adapter
        command:
        - /adapter
        - --use-new-resource-model=true
        - --cert-dir=/tmp
        - --secure-port=4443
        resources:
          limits:
            cpu: 250m
            memory: 200Mi
          requests:
            cpu: 250m
            memory: 200Mi
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000
---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: custom-metrics-stackdriver-adapter
    k8s-app: custom-metrics-stackdriver-adapter
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: Adapter
  name: custom-metrics-stackdriver-adapter
  namespace: custom-metrics
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: 4443
  selector:
    run: custom-metrics-stackdriver-adapter
    k8s-app: custom-metrics-stackdriver-adapter
  type: ClusterIP
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  insecureSkipTLSVerify: true
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 100
  versionPriority: 100
  service:
    name: custom-metrics-stackdriver-adapter
    namespace: custom-metrics
  version: v1beta1
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta2.custom.metrics.k8s.io
spec:
  insecureSkipTLSVerify: true
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 100
  versionPriority: 200
  service:
    name: custom-metrics-stackdriver-adapter
    namespace: custom-metrics
  version: v1beta2
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.external.metrics.k8s.io
spec:
  insecureSkipTLSVerify: true
  group: external.metrics.k8s.io
  groupPriorityMinimum: 100
  versionPriority: 100
  service:
    name: custom-metrics-stackdriver-adapter
    namespace: custom-metrics
  version: v1beta1
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-metrics-reader
rules:
- apiGroups:
  - "external.metrics.k8s.io"
  resources:
  - "*"
  verbs:
  - list
  - get
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: external-metrics-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-metrics-reader
subjects:
- kind: ServiceAccount
  name: horizontal-pod-autoscaler
  namespace: kube-system

Because it was disabled the port 443 and I had to change to 4443 and put also the --cert-dir=/tmp option because without that option, stackdriver returns me the error:

"unable to run custom metrics adapter: error creating self-signed certificates: mkdir apiserver.local.config: permission denied"

I think that I explained all the steps that I did to configure it, without success. Any ideas?


Solution

  • Resolved for me!

    After a several of test, changing in the HPA yaml,

    the metric from Pod to External, and the metric name with custom.google.apis/my-metric, it works!

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: my-hpa
      namespace: test-namespace
    spec:
      maxReplicas: 60
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: StatefulSet
        name: my-set
      metrics:
      - type: External
        external:
          metric: 
            name: custom.googleapis.com/my-metric #custom.googleapis.com/my-metric
          target:
            averageValue: 1
            type: AverageValue