prometheuspromqlcadvisor

Obtaining CPU Usage with Prometheus and Cadvisor in Grafana


I'm pretty new to the PromQL language, and so I'm running into an issue where I'm trying to obtain CPU usage per container in a "Time series" chart, but I can't figure out how to divide by the number of total cores (I prefer to view CPU utilization to a maximum scale of 100%). Here's the query I'm attempting to use:

sum(rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) / sum(machine_cpu_cores)) by (name)

This doesn't work. I thought that since "sum(machine_cpu_cores)" is simply returning the sum of total cores (in my case 8), that I could divide by that, but I guess this isn't the case. Instead, I took that out and manually substituted the number 8 as shown below:

sum(rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) / 8) by (name)

Manually putting in "8" to represent the number of cores makes this work, but I wanted to use a query closer to the first example that returns the number of cores - instead of having to input the number. Is there something I can do to make that work?


Solution

  • As you probably guessed problem lies with your division operation.

    rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) returns vector with same labels as present in metric container_cpu_usage_seconds_total, sum(machine_cpu_cores) returns vector with no labels.

    While dividing vector on vector, Prometheus matches values with same labels and returns result. Since there is no actual pairs in your arguments in returns empty result.

    To correct this behavior you have two ways:

    Vector matching

    Use on() group_left().

    on() supplies list of labels to be used for matching. In our case list is empty, so everything from the left matches everything from the right. But since LHS has more then one value, you need to specify behavior of many-to-one matching.

    group_left() says for every LHS argument take one correct RHS argument and use it in operation.

    Resulting query will look like this:

    sum by (name) (
      rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) 
      / on() group_left() sum(machine_cpu_cores)
    ) 
    

    Converting to scalar

    Since you divisor is always a single value, you can convert it to scalar with function scalar() and skip all the hustle with label matching.

    Resulting query will look like this:

    sum by (name) (
      rate(container_cpu_usage_seconds_total{name=~".+"}[$__rate_interval]) 
      / scalar(sum(machine_cpu_cores))
    ) 
    

    Note this this solution is only available in cases where one of operands is guaranteed to have a single value, and might be not the greatest in terms of support (if you'll decide later to add more dimensions to the result set, it will require to rewrite query)