kubernetesservercloudmicroservicesprometheus

`irate(container_cpu_usage_seconds_total{...}[{duration}s])` returns no data in Prometheus


I deploy Prometheus on my minikube cluster which has 5 nodes. I also deploy a microservice on the cluster. Prometheus can collect the data normally.

I use wrk2 which is a workload generator to send requests to my microservices. Jaeger shows that the requests are processed normally.

The following is what confuse me. After I test the service for duration seconds, I try to use sum(irate(container_cpu_usage_seconds_total{{{constraint}}}[{duration}s])) by (container, pod) to get the CPU usage of the pods. However, The vast majority of pods have zero CPU usage, which means no query results. I was very surprised by this because in duration seconds I increased the load on the service (i.e., sent a lot of requests to it), but it didn't increase the CPU usage compared to when there was no load.

Following is the python function I used to query Prometheus:

# endtime=starttime+duration, starttime is the time when I start wrk2 to generate workload
def get_cpu_usage(self, starttime, endtime, duration, diaplay=False):
        # Define Prometheus query to get CPU usage for each service
        constraint = f'namespace="{self.namespace}", container!="POD", container!=""'
        prometheus_query = (
            f"sum(irate(container_cpu_usage_seconds_total{{{constraint}}}[{duration}s])) by (container, pod)"
            + " / " + f"(sum(container_spec_cpu_quota{{{constraint}}}/({duration}*1000)) by (container, pod)) * 100"
        )

        # Send query to Prometheus endpoint
        sleep(1)
        response = requests.get(self.prometheus_url + '/api/v1/query_range', params={
            'query': prometheus_query,
            'start': starttime,
            'end': endtime,
            'step': 1
        })

        # Parse response
        usage = response.json()
        cpu_result = pd.DataFrame(columns=["microservice", "pod", "usage"])

Does there any bug in the code or the Prometheus setting?


Solution

  • I change duration to 1m and it works well.