I setup Prometheus and blackbox exporter. Here are configs.
root@monitor-1:~# cat /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
scrape_interval: 5s
static_configs:
- targets:
- http://wiki.itsmwork.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.20.202:9115
root@monitor-1:~# cat /etc/prometheus/blackbox.yaml | more
modules:
http_2xx:
prober: http
timeout: 5s
http:
preferred_ip_protocol: "ip4"
no_follow_redirects: false
fail_if_ssl: false
tls_config:
insecure_skip_verify: true
I checked the http site manually, and it returned 0 which was expected.
root@monitor-1:~# curl "http://localhost:9115/probe?target=wiki.itsmwork.com&module=http_2xx" | grep -v '^#'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2013 100 2013 0 0 294k 0 --:--:-- --:--:-- --:--:-- 327k
probe_dns_lookup_time_seconds 0.002698265
probe_duration_seconds 0.00308218
probe_failed_due_to_regex 0
probe_http_content_length 0
probe_http_duration_seconds{phase="connect"} 0
probe_http_duration_seconds{phase="processing"} 0
probe_http_duration_seconds{phase="resolve"} 0
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0
probe_http_redirects 0
probe_http_ssl 0
probe_http_status_code 0
probe_http_uncompressed_body_length 0
probe_http_version 0
probe_ip_addr_hash 0
probe_ip_protocol 0
probe_success 0
But if I checked the same target in Prometheus UI, up{instance="http://wiki.itsmwork.com",job="blackbox"} was always 1.
How can I determine what the problem is?
Be careful to not mix up the up
and the probe_success
when dealing with blackbox exporter. The first metric says that the exporter itself is reachable, the latter one is about the target the blackbox exporter ifself scrapes. So the combination you get is:
This also matches your manual tests: the request to the blackbox_exporter instance (your curl command) works but results in a probe failure (as seen in the payload). So, for your dashboards, you should always combine the up
metric with the probe_success
if you want to reason about the system that is probed as there could also be the scenario that your system to be monitored is running properly but the blackbox exporter job isn't. You would be able to spot this using the up
metric switching to 0
.