I created a systemd service that is running in our system and I want to monitor it using a telegraf agent that I already have installed on the instance. The Agent is currently monitoring the basic infra stuff and I need to add monitoring to the new service.
I couldn't find any example on how to do it which is strange, I would expect telegraf to have some sort of plugin for something that basic.
My service is running a python script that doesn't expose any port so I can do a normal HTTP health check.
any help will be appreciated.
So I found that indeed there is a plugin that monitors systems service, The name is systemd_units.
This is the configuration I've implemented:
# Gather systemd units state
[[inputs.systemd_units]]
## Set timeout for systemctl execution
timeout = "1s"
# Filter for a specific unit type, default is "service", other possible
# values are "socket", "target", "device", "mount", "automount", "swap",
# "timer", "path", "slice" and "scope ":
unittype = "service"
# Filter for a specific pattern, default is "" (i.e. all), other possible
# values are valid pattern for systemctl, e.g. "a*" for all units with
# names starting with "a"
pattern = ""
## pattern = "telegraf* influxdb*"
## pattern = "a*"
After getting the metrics in the influxDB This is the query I used to extract the data I needed:
from(bucket: "veeva")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_field"] == "active_code")
|> filter(fn: (r) => r["_measurement"] == "systemd_units")
|> filter(fn: (r) => r["active"] == "active")
|> filter(fn: (r) => r["host"] == "10.192.21.66")
|> filter(fn: (r) => r["name"] == "myservice.service")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
[1]:
And this is how it looks like in Grafana:
https://docs.influxdata.com/telegraf/v1.22/plugins/#systemd_units