I have set up a node-exporter DaemonSet on kubernetes as well as a service that points to these node-exporter pods IPs (I followed this tutorial).
When I run kubectl get endpoints -n monitoring
, I verify that the service is correctly pointing to the 3 DaemonSet pods that were created.
After that, inside the prometheus.yml
file I have added this config for scraping the node-exporter metrics:
- job_name: "node-exporter"
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
regex: "node-exporter"
action: keep
The problem is when I apply these configs and restart the prometheus.service:
> systemctl status prometheys.service --no-pager --full
● prometheus.service - PromServer
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2024-06-18 16:24:18 UTC; 1min 41s ago
Main PID: 1441565 (prometheus)
Tasks: 10 (limit: 33613)
Memory: 38.2M
CPU: 325ms
CGroup: /system.slice/prometheus.service
└─1441565 /usr/local/bin/prometheus --web.enable-admin-api --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.654Z caller=head.go:755 level=info component=tsdb msg="WAL segment loaded" segment=158 maxSegment=159
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.655Z caller=head.go:755 level=info component=tsdb msg="WAL segment loaded" segment=159 maxSegment=159
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.655Z caller=head.go:792 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=7.37407ms wal_replay_duration=43.4001ms wbl_replay_duration=200ns total_replay_duration=51.456586ms
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.657Z caller=main.go:1040 level=info fs_type=EXT4_SUPER_MAGIC
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.657Z caller=main.go:1043 level=info msg="TSDB started"
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.657Z caller=main.go:1224 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.661Z caller=manager.go:317 level=error component="discovery manager scrape" msg="Cannot create service discovery" err="unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined" type=kubernetes config=node-exporter
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.664Z caller=main.go:1261 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=6.861558ms db_storage=1.6µs remote_storage=1.2µs web_handler=500ns query_engine=800ns scrape=3.46318ms scrape_sd=100.802µs notify=42.801µs notify_sd=13.7µs rules=2.858566ms tracing=8.1µs
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.664Z caller=main.go:1004 level=info msg="Server is ready to receive web requests."
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.664Z caller=manager.go:995 level=info component="rule manager" msg="Starting rule manager..."
From the output, I get this error:
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.661Z caller=manager.go:317 level=error component="discovery manager scrape" msg="Cannot create service discovery" err="unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined" type=kubernetes config=node-exporter
I haven't had any luck with my googling so far... Can anybody help me guide on what variables am I missing in the configuration?
The error indicates that the prometheus is not able to find the configurations(KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT) which are required for prometheus to interact with the Kubernetes API server.
You can follow these troubleshooting steps to resolve the issue:
Ensure that the prometheus deployment contains the environment variables mentioned above. Incase if the variables are missing then add them with the following configurations:
env:
- name: "KUBERNETES_SERVICE_HOST"
value: kubernetes.default.svc
- name: KUBERNETES_SERVICE_PORT
value: "443"
Ensure that the Service Account which is used by prometheus has appropriate RBAC permissions which are required to access the services and endpoints
If you have prometheus logs it is better to monitor them for any service discovery related errors.
Cross check whether you are using the correct namespace and correct endpoints in the prometheus.yml
file.
Note:
If you are trying to export the metrics to external prometheus which is not on the same cluster as kubernetes then follow this link.