I deploy Prometheus on my minikube cluster which has 5 nodes. I also deploy a microservice on the cluster. Prometheus can collect the data normally.
I use wrk2
which is a workload generator to send requests to my microservices. Jaeger shows that the requests are processed normally.
The following is what confuse me. After I test the service for duration
seconds, I try to use sum(irate(container_cpu_usage_seconds_total{{{constraint}}}[{duration}s])) by (container, pod)
to get the CPU usage of the pods. However, The vast majority of pods have zero CPU usage, which means no query results. I was very surprised by this because in duration
seconds I increased the load on the service (i.e., sent a lot of requests to it), but it didn't increase the CPU usage compared to when there was no load.
Following is the python function I used to query Prometheus:
# endtime=starttime+duration, starttime is the time when I start wrk2 to generate workload
def get_cpu_usage(self, starttime, endtime, duration, diaplay=False):
# Define Prometheus query to get CPU usage for each service
constraint = f'namespace="{self.namespace}", container!="POD", container!=""'
prometheus_query = (
f"sum(irate(container_cpu_usage_seconds_total{{{constraint}}}[{duration}s])) by (container, pod)"
+ " / " + f"(sum(container_spec_cpu_quota{{{constraint}}}/({duration}*1000)) by (container, pod)) * 100"
)
# Send query to Prometheus endpoint
sleep(1)
response = requests.get(self.prometheus_url + '/api/v1/query_range', params={
'query': prometheus_query,
'start': starttime,
'end': endtime,
'step': 1
})
# Parse response
usage = response.json()
cpu_result = pd.DataFrame(columns=["microservice", "pod", "usage"])
Does there any bug in the code or the Prometheus setting?
I change duration
to 1m
and it works well.