kubernetescontainersprometheuspromqlcadvisor

Prometheus different results, same pod


I have some trouble in understanding why I get multiple results for the same pod in Prometheus/Grafana.

I'm trying to get cpu usage through rate(container_cpu_usage_seconds_total{namespace=~".+-test", pod=~"my-server-.+", image!~"|.*pause.*", container!="POD"}[5m]).

The container label excludes the results with the POD string. I found that those refers to the pause container which holds namespace and other things before the container starts.

However I get pause containers in the image label. So I excluded them from that label.

Then I found some containers without the image label and I excluded them inserting an or (|) in the image label.

In some cases the cpu usage of the container without the image label is lower than the one of the "correct" container (the one with the correct image and container labels) and in other cases it is very similar, but never the same.

Example:

Server 1 image

Server 2 image

I would like to understand what are those containers and what they refer to.

PS. the metrics are from cadvisor.


Solution

  • Try this query:

    rate(container_cpu_usage_seconds_total{container!="POD", container=~".+"}[5m])
    

    In short, CPU usage is available at several resolutions (container, pod, QoS class) and this query above effectively eliminates everything except containers that you defined explicitly. !="POD" removes pause containers and container=~".+" means "not empty". No resolution besides "per container" has this label.