grafanainfluxdbtelegrafinfluxql

How to count tags rather than metrics


I'm running the Telegraf-DS helm chart to gather some metrics/stats from a Kubernetes cluster. I'm then plotting this data via Grafana. (The list of available Metrics/Tags are here)

I'd like to chart how many pods (in a given namespace) are running on each node. I don't appear to be able to craft a suitable query from the data available. The best I have come up with is as follows:

SELECT count(distinct("memory_page_faults")) FROM "kubernetes_pod_container" WHERE ("namespace" = 'foobar') AND $timeFilter GROUP BY time($__interval), "node_name" fill(null)

This sort of works, but is showing short spikes with extra pods being counted that I'm 100% sure don't exist.

enter image description here

I think I could make it work if I could work out how to count the pod_name tags grouped by node_name tag


Solution

  • Your query is almost good in my opinion.

    I think that in $__interval period your pods are changing and there are old ones and new ones.

    I would suggest hot fix: put manualy lower $__interval time, the best would be to use raw data interval period. If data comes from Telegraf it would be 10s for default.

    If your query would be slow in that use case, you can use subqueries or use selectors like "first" in a subquery but it will limit accuracy of your data.