I'm trying to monitor my Node.js API built with Express using Prometheus, but I'm having trouble exporting the metrics because it's running on a server via Docker Swarm, with approximately 6 replicas. I tried configuring dns_sd_configs, but each instance creates a new counter. I want to group them to create charts in Grafana, such as 2XX requests, 5XX requests, etc.
The name of my service is backend-server, and I want to scrape the data from port 9464 and the endpoint /api/metrics. I configured my prometheus.yaml as follows:
- job_name: 'dockerswarm'
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock
role: tasks
relabel_configs:
# Only keep containers that should be running.
- source_labels: [__meta_dockerswarm_task_desired_state]
regex: running
action: keep
# Only keep containers with the specific service name.
- source_labels: [__meta_dockerswarm_service_name]
regex: backend-server
action: keep
- source_labels: [__meta_dockerswarm_node_address]
target_label: __address__
replacement: $1:9464/api/metrics
It's not throwing any errors, but it doesn't appear in the targets of my application...
root@srv:~# docker service ls | grep backend
z5bnz2t5riw8 backend-server replicated 6/6 xx/xx/backend-server:x *:3000->3000/tcp
root@srv:~# docker service ls | grep promethe
8zlh5kwfx8ks prometheus replicated 1/1 prom/prometheus:v2.52.0 *:9090->9090/tcp
I am configuring it as follows to make it work.
scrape_configs:
- job_name: cadvisor
scrape_interval: 1m
static_configs:
- targets:
- cadvisor:8080
- job_name: node
scrape_interval: 1m
static_configs:
- targets: ['host.docker.internal:9100', 'victoria.consorcio.local:9100']
- job_name: backend
scrape_interval: 15s
metrics_path: /victoria/api/metrics
dns_sd_configs:
- names:
- 'tasks.backend-server'
type: 'A'
port: 9464
But the counter creates one for each instance.
error_counter_total{instance="10.0.1.15:9464", job="backend", method="POST", status="401"}
error_counter_total{instance="10.0.1.16:9464", job="backend", method="POST", status="401"}
The Prometheus container is successfully scraping all 6 replicas. So, you can simply use PromQL to aggregate and group results in your Grafana dashboard.
For example, you can create a panel in Grafana with the following query to sum the count of 500 response code on all the nodes:
sum(error_counter_total{job="backend", method="POST", status="500"})
.
You can check the other aggregation operators here.