kubernetesflaskazure-akshpa

If there are not enough pods in my Azure Kubernetes Service (AKS) I get an error response


I am using a horizontal pod autoscaler (hpa) in AKS (I will show this file below). My containers run a Flask API server that handles a post request. I used this line to run flask to make it threaded:

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5003, threaded=True)

I do 20 calls on my Flask running locally and it is able to handle it, albeit very slowly. I do 20 calls on my AKS, the first time (so there is only 1 pod running)it gives me error responses. The second time, I get 20 responses without any errors (the number of pods has increased)

Now I am trying to figure out why it does not wait for an old pod to become available or for a new pod to be created. I thought that there was part of AKS that would do that.

Please let me know if I am missing something!

Deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: *hidden*
spec:
  selector:
    matchLabels:
      app: *hidden*
  template:
    metadata:    
      labels:
        app: *hidden*
    spec:
      containers:
      - name: *hidden*
        image: *hidden*
        env:
        - name: *hidden*
          valueFrom:
            secretKeyRef:
              name: *hidden*
              key: *hidden*
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "300m"
            memory: "400Mi"
          limits:
            cpu: "300m"
            memory: "400Mi"
        ports:
        - containerPort: 5003

      imagePullSecrets:
      - name: *hidden*
    ---

apiVersion: v1
kind: Service
metadata:
  name: *hidden*
spec:
  selector:
    app: *hidden*
  ports:
  - port: 5003
    protocol: TCP
    targetPort: 5003
  type: LoadBalancer

hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: *hidden*
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: *hidden*
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 20
  behavior:
    scaleUp:
      policies:
      - type: Pods
        value: 20
        periodSeconds: 60
    scaleDown:
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60```
  

Solution

  • From your description, it seems like your Flask API server is not able to handle the load of 20 requests at once. This can be due to insufficient CPU and memory resources allocated to your containers in the AKS cluster.

    When you send 20 requests at once, the first pod might get overwhelmed with requests and start responding with error messages. However, when the HPA kicks in and scales up the number of pods, the load is distributed among multiple pods, allowing them to handle the requests without any errors.

    By increasing the resource requests and limits to your deployment configuration, you can ensure that each pod has sufficient resources allocated to handle the expected load. This can help avoid errors due to resource exhaustion and provide a better experience for your users.

    In your deployment configuration, you have set the resource requests and limits to "300m" CPU and "400Mi" memory. You may need to increase these values based on the load that your Flask API server is expected to handle. You can also use tools like Kubernetes Dashboard or Prometheus to monitor the resource usage of your containers and adjust the resource limits accordingly.

    try to increase CPU and memory based on your findings in below section for requests and limits

    resources:
      limits:
        cpu: "500m"
        memory: "512Mi"
      requests:
        cpu: "250m"
        memory: "256Mi"
    

    More on requests and limits here

    Also will suggest to use readiness probe, readiness probe can also help in such cases, it can help ensure that your application is fully available before it receives any traffic. In the case of your Flask application running in Kubernetes, a readiness probe can be useful to verify that the application has fully started up and is ready to receive traffic. This can help avoid situations where the application is not fully available when the service is started, which can result in errors or delays for clients.

    To configure the readiness probe, you can add the following section to your deployment spec:

    spec:
      containers:
      - name: <container-name>
        readinessProbe:
          httpGet:
            path: /<health-check-endpoint>
            port: <container-port>
          initialDelaySeconds: 10
          periodSeconds: 5
    

    Replace with the name of your container, with the endpoint that your Flask application exposes to check its health status, and with the port that your Flask application listens to.

    For example, if your Flask application exposes a health check endpoint at /health and listens to port 5003, the readiness probe configuration would look like this:

    spec:
      containers:
      - name: *hidden*
        image: *hidden*
        ...
        readinessProbe:
          httpGet:
            path: /health
            port: 5003
          initialDelaySeconds: 10
          periodSeconds: 5
    

    This will ensure that Kubernetes only sends traffic to the container when it is actually ready to handle it.

    You can get a good understanding of readiness probe here

    Just for more clarity in the case of a Flask app, you can use a route to implement a readiness probe. Here's an example:

    from flask import Flask, jsonify
    
    app = Flask(__name__)
    
    @app.route('/healthz')
    def healthz():
        return jsonify({'status': 'ok'})
    
    @app.route('/api')
    def api():
        # your API logic here
        return jsonify({'result': 'success'})
    
    if __name__ == '__main__':
        app.run(host='0.0.0.0', port=5000)
    

    In this example, we've added a new route '/healthz' that returns a JSON response indicating that the app is healthy. You can use this route as your readiness probe. This flask code is just for explaining , this might contain errors

    To specify the readiness probe in your deployment, you can add the following to your container spec:

    readinessProbe:
      httpGet:
        path: /healthz
        port: 5000
      initialDelaySeconds: 10
      periodSeconds: 5
    

    This specifies that the readiness probe should perform an HTTP GET request to the '/healthz' endpoint on port 5000, with an initial delay of 10 seconds and a period of 5 seconds between probes.

    Once you've added this to your deployment, Kubernetes will use the readiness probe to determine when your containers are ready to receive traffic. If the readiness probe fails, Kubernetes will stop sending traffic to that container until it becomes ready again.