kubernetesnginx-ingressstatsdamazon-ekscollectd

Collectd not accepting metrics with nginx-ingress tag


I am running a pod with two containers: global-metrics-generator and collectd-statsd. In container global-metrics-generator i am running a python script as a cron which is responsible for fetching all pods within the k8s cluster and pushing metrics related to pod cpu and memory to localhost:28125 which is being listened to by collectd process running inside collectd-statsd container. I'm using python statsd client in my script to push these metrics from global-metrics-generator to collectd-statsd container.

I have been using this setup for close to 1 year now and it has been working seamlessly. But recently i introduced an nginx-ingress pod in my system and somehow even though i see the python script sending cpu/memory metrics for nginx-ingress pod to collectd, the metric is not being collected inside /var/lib/collectd/ folder in collectd-statsd container. For any other pod in my k8s cluster it is working as expected.

Strange thing:

When i change the pod name of nginx-ingress to any other name to not have the word ingress in it e.g.: nginx-ingres(note the single s), i am seeing that metric being collected inside /var/lib/collectd/ folder.

The structure of my final metric with custom tags attached looks like:

What works - [container=nginx-ingres,name=nginx-ingres-6bf8b67bb7-ndmjn,replicaset=nginx-ingres-6bf8b67bb7,ip=100.101.28.65,host_ip=10.36.40.229,Namespace=nginx-ingress]container.cpu.usage

What doesn't work - [container=nginx-ingress,name=nginx-ingress-599c78d7b6-psxns,replicaset=nginx-ingress-599c78d7b6,ip=100.102.33.199,Namespace=nginx-ingress,host_ip=10.36.40.170]container.cpu.usage

I couldn't find anything online related to this but is the word ingress reserved in collectd/statsd? If so then any idea why am i able to pass Namespace=nginx-ingress tag along with a metric?


Solution

  • I figured it out actually. Interestingly, the issue has nothing to do with the word ingress.

    We add a lot of custom dimensions to any metric that we emit from our application. Collectd joins the metric name and key, value pairs(dimensions) passed with the metric together and stores it as files inside /var/lib/collectd/ directory, e.g. [container=nginx-ingres,name=nginx-ingres-6bf8b67bb7-ndmjn,replicaset=nginx-ingres-6bf8b67bb7,ip=100.101.28.65,host_ip=10.36.40.229,Namespace=nginx-ingress]container.cpu.usage. These files contain the actual values of the emitted metrics. Since all of the dimensions are present in the file name, the length of the file name is generally long.

    Linux has a limitation of 255 characters on the length of filenames. Read: https://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs. Since we were adding the dimension container=nginx-ingress along with some already existing dimensions to all of our metrics we ended up reaching 256 characters. And while debugging the issue before knowing about the limit on file-name length, I suspected ingress might be a reserved word hence tried with ingres which actually worked, but all it did was reduce the length of filename to 255 characters.

    We solved the issue by removing some unnecessary custom tags from all of our metrics.