In a recent experiment, I tried to autoscale my K8s cluster using two mechanisms: KEDA and HPA (see below). I wanted to use HPA OOB resource metrics to scale my cluster based on pod resource utilization (memory and CPU) and KEDA to autoscale based on custom-metrics.
Even though my deployment succeeds and the cluster was healthy and functional. When autoscaling kicked in, the cluster went haywire! Pods were constantly being provisioned and then de-provisioned, this state continued on even after I stopped the traffic against the cluster. I had to wait for the cool-down periods before it went sane again.
I didn't find any official documentation on this topic, thus, asking here.
My questions:
This was on K8s version 1.15.11 and KEDA 1.4.1
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: {{ $fullName }}
labels:
deploymentName: {{ $fullName }}
{{- include "deployment.labels" . | nindent 4 }}
spec:
scaleTargetRef:
deploymentName: {{ $fullName }}
pollingInterval: {{ .Values.scaleobject.pollingInterval }}
cooldownPeriod: {{ .Values.scaleobject.cooldownPeriod }}
minReplicaCount: {{ .Values.scaleobject.minReplicaCount }}
maxReplicaCount: {{ .Values.scaleobject.maxReplicaCount }}
triggers:
- type: prometheus
metadata:
serverAddress: {{ tpl .Values.scaleobject.serverAddress . | quote }}
metricName: access_frequency
threshold: "{{ .Values.scaleobject.threshold }}"
query: {{ tpl .Values.scaleobject.query . | quote }}
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: resource-utilization-scaling
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ $fullName }}
minReplicas: {{ .Values.scaleobject.minReplicaCount }}
maxReplicas: {{ .Values.scaleobject.maxReplicaCount }}
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: {{ .Values.scaleobject.cpuUtilization }}
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: {{ .Values.scaleobject.memUtilization }}
KEDA doesn't have direct cluster autoscaler support yet so you will have some unpredictability. In essence, you have two pieces of information that are not being shared that of KEDA and that of the cluster autoscaler and some of those may not agree at a particular time.
Best in my opinion to slow down your autoscaling overall of everything so that it allows all the autoscaler to catch up with any discrepancy. For example, you can make use of things like cooldown in an autoscaling group to avoid some resource starvation.
✌️