[SOLVED] How to monitor a systemd service using telegraf?

How to monitor a systemd service using telegraf?

I created a systemd service that is running in our system and I want to monitor it using a telegraf agent that I already have installed on the instance. The Agent is currently monitoring the basic infra stuff and I need to add monitoring to the new service.

I couldn't find any example on how to do it which is strange, I would expect telegraf to have some sort of plugin for something that basic.

My service is running a python script that doesn't expose any port so I can do a normal HTTP health check.

any help will be appreciated.

Solution

So I found that indeed there is a plugin that monitors systems service, The name is systemd_units.

This is the configuration I've implemented:

# Gather systemd units state
[[inputs.systemd_units]]
  ## Set timeout for systemctl execution
   timeout = "1s"

  # Filter for a specific unit type, default is "service", other possible
  # values are "socket", "target", "device", "mount", "automount", "swap",
  # "timer", "path", "slice" and "scope ":
  unittype = "service"

  # Filter for a specific pattern, default is "" (i.e. all), other possible
  # values are valid pattern for systemctl, e.g. "a*" for all units with
  # names starting with "a"
  pattern = ""
  ## pattern = "telegraf* influxdb*"
  ## pattern = "a*"

After getting the metrics in the influxDB This is the query I used to extract the data I needed:

from(bucket: "veeva")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_field"] == "active_code")
  |> filter(fn: (r) => r["_measurement"] == "systemd_units")
  |> filter(fn: (r) => r["active"] == "active")
  |> filter(fn: (r) => r["host"] == "10.192.21.66")
  |> filter(fn: (r) => r["name"] == "myservice.service")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")
  [1]:

And this is how it looks like in Grafana:

https://docs.influxdata.com/telegraf/v1.22/plugins/#systemd_units