I have a custom python plugin that I am using to pull data into Telegraf. It prints out line protocol output, as expected.
In my Ubuntu 18.04 environment, when this plugin is run I see a single line in my logs:
2020-12-28T21:55:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/my_company/plugins-enabled/plugin-mysystem/poll_mysystem.py': Traceback (most recent call last):...
That is it. I can't figure out how to get the actual traceback.
If I run sudo -u telegraf /usr/bin/telegraf -config /etc/telegraf/telegraf.conf
, the plugin works as expected. It polls and loads data exactly as it should.
I'm not sure how to move forward with troubleshooting this error when telegraf is executing the plugin on it's own.
I have restarted the telegraf service. I have verified permissions (and I think that the execution above shows that it should work).
A few additional details based on the comments and answers received:
telegraf:telegraf
. The error does not seem to indicate that it can't see the file that is being executed, but rather something within the file is failing when Telegraf executes the plugin.Plugin code (/my_company/plugins-enabled/plugin-mysystem/poll_mysystem.py
):
from google.auth.transport.requests import Request
from google.oauth2 import id_token
import requests
import os
RUNTIME_URL = INTERNAL_URL
MEASUREMENT = "MY_MEASUREMENT"
CREDENTIALS = "GOOGLE_SERVICE_FILE.json"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = CREDENTIALS # ENV VAR REQUIRED BY GOOGLE CODE BELOW
CLIENT_ID = VALUE_FROM_GOOGLE
exclude_fields = ["name", "version"] # Don't try to put these into influxdb from json response
def make_iap_request(url, client_id, method="GET", **kwargs):
# Code provided by Google docs
# Set the default timeout, if missing
if "timeout" not in kwargs:
kwargs["timeout"] = 90
# Obtain an OpenID Connect (OIDC) token from metadata server or using service
# account.
open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
# Fetch the Identity-Aware Proxy-protected URL, including an
# Authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(method, url, headers={"Authorization": "Bearer {}".format(open_id_connect_token)}, **kwargs)
if resp.status_code == 403:
raise Exception("Service account does not have permission to " "access the IAP-protected application.")
elif resp.status_code != 200:
raise Exception(
"Bad response from application: {!r} / {!r} / {!r}".format(resp.status_code, resp.headers, resp.text)
)
else:
return resp.json()
def print_results(results):
"""
Take the results of a Dolores call and print influx line protocol results
"""
for item in results["workflow"]:
line_protocol_line_base = f"{MEASUREMENT},name={item['name']}"
values = ""
for key, value in item.items():
if key not in exclude_fields:
values = values + f",{key}={value}"
values = values[1:]
line_protocol_line = f"{line_protocol_line_base} {values}"
print(line_protocol_line)
def main():
current_runtime = make_iap_request(URL, CLIENT_ID, timeout=30)
print_results(current_runtime)
if __name__== "__main__":
main()
Relevant portion of the telegraf.conf
file:
[[inputs.exec]]
## Commands array
commands = [
"/my_company/plugins-enabled/plugin-*/poll_*.py",
]
Agent section of config file
[agent]
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = true
What do I do next?
The exec
plugin is truncating your Exception message at the newline. If you wrap your call to make_iap_request
in a try/except block, and then print(e, file=sys.stderr)
rather than letting the Exception bubble all the way up, that should tell you more.
def main():
"""
Query URL and print line protocol
"""
try:
current_runtime = make_iap_request(URL, CLIENT_ID, timeout=30)
print_results(current_runtime)
except Exception as e:
print(e, file=sys.stderr)
Alternately your script could log error messages to it's own log file, rather than passing them back to Telegraf. This would give you more control over what's logged.
I suspect you're running into an environment issue, where there's something different about how you're running it. If not permissions, it could be environment variable differences.