kubernetesterraformprometheusminikubekeda

How to debug KEDA+prometheus autoscaling deployment


I have this repo terraform1

I have created minikube using virtualbox (can also docker, but docker has a lot of issues in my machine, especially network part)

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start --driver=virtualbox

built the dummy docker with prometheus metrics and push it to my docker hub:

docker login -u kokizzu 
docker build -t pf1 .
docker image ls pf1
# REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
# pf1          latest    204670ee86bd   2 minutes ago   89.3MB

docker image tag pf1 kokizzu/pf1:latest
docker image push kokizzu/pf1:latest

make minikube able to pull from docker registry (that requires docker login)

minikube addons configure registry-creds
Do you want to enable AWS Elastic Container Registry? [y/n]: n
Do you want to enable Google Container Registry? [y/n]: n
Do you want to enable Docker Registry? [y/n]: y
-- Enter docker registry server url:
--Error, please enter a value:-- Enter docker registry server url: https://hub.docker.com/
-- Enter docker registry username: kokizzu
-- Enter docker registry password:
Do you want to enable Azure Container Registry? [y/n]: n
✅  registry-creds was successfully configured

then I use terraform-kubernetes and terraform-helm for KEDA (since hcl can reference variables easier than yaml, but yaml can do inheritance better)

terraform init
terraform apply

It created the pods properly with no issue:

alias k='minikube kubectl --'
k get pods -w -n pf1ns                                                                                     130 ↵
NAME                                               READY   STATUS    RESTARTS  AGE
keda-admission-webhooks-78dcd76878-h797w           1/1     Running   0         17m
keda-operator-c76d89655-f2dc5                      1/1     Running   0         17m
keda-operator-metrics-apiserver-6cdcd87dfd-9gcts   1/1     Running   0         17m
pf1deploy-6657c4d485-q569h                         1/1     Running   0         27h
prom1stateful-0                                    1/1     Running   0         24h

minikube service list
|-------------|---------------------------------|--------------|-----------------------------|
|  NAMESPACE  |              NAME               | TARGET PORT  |             URL             |
|-------------|---------------------------------|--------------|-----------------------------|
| default     | kubernetes                      | No node port |                             |
| kube-system | kube-dns                        | No node port |                             |
| pf1ns       | keda-admission-webhooks         | No node port |                             |
| pf1ns       | keda-operator                   | No node port |                             |
| pf1ns       | keda-operator-metrics-apiserver | No node port |                             |
| pf1ns       | pf1svc                          |        33000 | http://192.168.59.100:31344 |
| pf1ns       | prom1svc                        |        10902 | http://192.168.59.100:30958 |
|-------------|---------------------------------|--------------|-----------------------------|

then do load tests using hey:

hey -c 100 -n 1000000 http://192.168.59.100:31344

the requests shown properly on prometheus on that address:

prometheus

and there is no error at all on keda pods:

k logs -f keda-admission-webhooks-78dcd76878-h797w -n pf1ns
k logs -f keda-operator-metrics-apiserver-6cdcd87dfd-9gcts -n pf1ns 
k logs -f keda-operator-c76d89655-f2dc5 -n pf1ns
2023-06-27T21:59:10Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"pf1keda","namespace":"pf1ns"}, "namespace": "pf1ns", "name": "pf1keda", "reconcileID": "7409b0a1-5363-4813-99bf-bdcfd3101ee8"}

what should I do to debug which part of the autoscaling that it's not working? (from k get pods -w -n pf1ns doesn't show any pod increase during load tests)


Solution

  • Still don't know how to debug, but in my case, since there's no error message, the problem was the prometheus query is wrong, checked on prometheus UI to confirm that the query is right.