I'm using telegraf, influxdb and grafana to make a monitoring system for a distributed application. The first thing I want to do is to count the number of java process running on a machine.
But when I make my request, the number of process is nearly random (always between 1 and 8 instead of always having 8).
I think there is a mistake in my telegraf configuration but i don't see where.. I tried to change interval
but nothing was different : it seems influxdb doesn't have all the data.
I'm running centos 7 and Telegraf v1.5.0 (git: release-1.5 a1668bbf)
All Java process I want to count :
[root@localhost ~]# pgrep -f java
10665
10688
10725
10730
11104
11174
16298
22138
My telegraf.conf :
[global_tags]
# Configuration for telegraf agent
[agent]
interval = "5s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = true
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = "my_server"
omit_hostname = false
My input.conf :
# Read metrics about disk usagee
[[inputs.disk]]
fielddrop = [ "inodes*" ]
mount_points=["/", "/workspace"]
# File
[[inputs.filestat]]
files = ["myfile.log"]
# Read the number of running java process
[[inputs.procstat]]
user = "root"
pattern = "java"
My request :
The response :
If you just want to count PID, it's a good way to use exec
like this :
[[inputs.exec]]
commands = ["pgrep -c java"] #command to execute
name_override = "the_name" #database's name
data_format = "my_value" #colunm's name
For commands
, use pgrep -c java
without option -f
because it's "full" and also counts the command pgrep
(and you have almost the same problem as if you use procstat).
Solution found here