monitoringinfluxdbgrafanatelegrafprocstat

Count the number of running process with Telegraf


I'm using telegraf, influxdb and grafana to make a monitoring system for a distributed application. The first thing I want to do is to count the number of java process running on a machine.

But when I make my request, the number of process is nearly random (always between 1 and 8 instead of always having 8).

I think there is a mistake in my telegraf configuration but i don't see where.. I tried to change interval but nothing was different : it seems influxdb doesn't have all the data.

I'm running centos 7 and Telegraf v1.5.0 (git: release-1.5 a1668bbf)

All Java process I want to count :

[root@localhost ~]# pgrep -f java
10665
10688
10725
10730
11104
11174
16298
22138

My telegraf.conf :

[global_tags]

# Configuration for telegraf agent
[agent]
  interval = "5s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = "/var/log/telegraf/telegraf.log"
  hostname = "my_server"
  omit_hostname = false

My input.conf :

# Read metrics about disk usagee
[[inputs.disk]]
  fielddrop = [ "inodes*" ]
  mount_points=["/", "/workspace"]                                                                                                                                                                                                                                  

# File
[[inputs.filestat]]
  files = ["myfile.log"]

# Read the number of running java process
[[inputs.procstat]]
  user = "root"
  pattern = "java"

My request :

request

The response :

response


Solution

  • If you just want to count PID, it's a good way to use exec like this :

    [[inputs.exec]]
      commands = ["pgrep -c java"] #command to execute
      name_override = "the_name"   #database's name
      data_format = "my_value"     #colunm's name
    

    For commands, use pgrep -c java without option -f because it's "full" and also counts the command pgrep (and you have almost the same problem as if you use procstat).

    Solution found here