Telegraf v1.0.1
I'm not able to see telegraf[._] (tree) metric anymore after I enabled [[inputs.procstat]] plugin.
Telegraf is installed successfully. Process is running. I'm pretty much using the normal settings for inputs plugins and output plugin.
This is what I got:
ubuntu@jenkins:/tmp/giga_aks_testing/ansible$ grep -C 2 jenkins /etc/telegraf/telegraf.d/telegraf-custom-host-services-processes.conf; echo ; ps -eAf|grep jenkins; echo; pgrep -f jenkins; echo; cat -n /var/log/telegraf/telegraf.log; echo date; echo; ps -eAf|grep telegraf; echo ; sudo service telegraf status
[[inputs.procstat]]
exe = "jenkins"
prefix = "pgrep_serviceprocess"
root 2875 3685 0 2016 pts/3 00:00:00 sudo su jenkins
root 2876 2875 0 2016 pts/3 00:00:00 su jenkins
jenkins 2877 2876 0 2016 pts/3 00:00:00 bash
jenkins 11645 1 0 2016 ? 00:00:01 /usr/bin/daemon --name=jenkins --inherit --env=JENKINS_HOME=/var/lib/jenkins --output=/var/log/jenkins/jenkins.log --pidfile=/var/run/jenkins/jenkins.pid -- /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
jenkins 11647 11645 0 2016 ? 05:33:22 /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
ubuntu 21973 26885 0 06:57 pts/0 00:00:00 grep --color=auto jenkins
2875
2876
11645
11647
1 2017-01-07T06:54:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
2 2017-01-07T06:55:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
3 2017-01-07T06:56:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
4 2017-01-07T06:57:00Z E! Error: procstat getting process, exe: [jenkins] pidfile: [] pattern: [] user: [] Failed to execute /usr/bin/pgrep. Error: 'exit status 1'
date
telegraf 19336 1 0 05:45 pts/0 00:00:04 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraftelegraf.d
ubuntu 21977 26885 0 06:57 pts/0 00:00:00 grep --color=auto telegraf
telegraf Process is running [ OK ]
ubuntu@jenkins:/tmp/giga_aks_testing/ansible$
Why, the log file is showing an error when the jenkins process is running and pgrep -f jenkins
is returning valid result.
PS: [[inputs.procstat]] plugin uses pgrep -f <exe_value_pattern>
for it's logic if pattern =
method is used, and pgrep <executable>
if exe =
method is used.
The full /etc/telegraf/telegraf.d/telegraf-custom-host-services-processes.conf file is:
[[inputs.procstat]]
exe = "jenkins"
prefix = "pgrep_serviceprocess"
[[inputs.procstat]]
exe = "telegraf"
prefix = "pgrep_serviceprocess"
[[inputs.procstat]]
exe = "sshd"
prefix = "pgrep_serviceprocess"
OK. Seems like this is an OPEN bug.
Telegraf with [[inputs.procstat]] plugin entry won't barf if there's only one plugin in one file.
If you specify multiple entries, even if those exe = <executables_processes>
are running, Telegraf will start spitting those errors out (PS: It won't stop Telegraf service from working though).
To fix the errors, this is what I did:
[[inputs.procstat]]
exe = "telegraf|.*"
prefix = "pgrep_serviceprocess"
Now, as pgrep is used for Telegraf's [[inputs.procstat]] plugin, it'll do this at OS level: pgrep "telegraf|.*"
.
Now, you can also just give exe = "."
(simplest) or like exe = ".*"
but practically those will not be easy to find out who actually is trying to do a grep on all processes running on the system.
NOTE: .*
(will find every single processes running on the machine), so use it until we get a proper fix for this.
Related Source code Github file: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/procstat/procstat.go
Related issue: https://github.com/influxdata/telegraf/issues/586
I still couldn't find, why "telegraf.x.x" metrics are not available after I enabled [[inputs.procstat]]
input. Is that due to a separate file? I'm not sure. But, I can see procstat.x.x
metric tree but telegraf.x.x
metric tree is not visible now.
OR better,
One can also use:
[[inputs.procstat]]
pattern = "."
prefix = "pgrep_serviceprocess"
The above will do: pgrep -f "."
where pattern is .
(to catch everything aka every processs/cmd/service running on a machine).
OR (but the following is not scalable solution as you have to know for which user. In some boxes, Jenkins may be running using a user other than jenkins
).
[[inputs.procstat]]
user = "jenkins"
prefix = "pgrep_serviceprocess"
The above will do: pgrep -u "jenkins"
where user is jenkins
(to catch everything aka every processs/cmd/service running on a machine).
To check whether jenkins
is running or not or if enhanceio is running or not, you can use [[inputs.exec]]
plugin as well. I simply used: [[inputs.filestat]]
plugin and it worked when I looked for the pid file for both tools. https://github.com/influxdata/telegraf/tree/master/plugins/inputs/filestat