prometheuspromqlprometheus-node-exporter

PromQL and node-exporter: Peak CPU usage during last minute averaged over many servers


Based on this answer I was trying to obtain peak CPU usage during last minute averaged over many servers. Here is my query:

max_over_time(sum(1-rate(node_cpu_seconds_total{mode="idle",instance!="epgp003:9401"}[1m]))by(instance))

However I get this error:

Error executing query: invalid parameter "query": 1:15: parse error: expected type range vector in call to function "max_over_time", got instant vector.

If I try this:

max_over_time((sum(1-rate(node_cpu_seconds_total{mode="idle",instance!="epgp003:9401"})[1m:]))by(instance))

I get this error:

Error executing query: invalid parameter "query": 1:107: parse error: unexpected <by>

Following query:

max_over_time((sum(1-rate(node_cpu_seconds_total{mode="idle",instance!="epgp003:9401"}[1m]))by(instance))[1m:])

Yields an array of values as response, while I need only one number (the maximum).


Solution

  • I believe, correct query will be:

    max(
     max_over_time(
      (sum(
       1-rate(
          node_cpu_seconds_total{mode="idle",instance!="epgp003:9401"}
          [1m])
       ) by (instance)
      )[1m:]
     )
    )
    

    Here we first apply max_over_time over range of 1 minute, to find max value for each server over that time. And then - apply max to get a single maximum value among those.