prometheusprometheus-blackbox-exporter

Prometheus UI always returns 1 even blackbox_exporter returns 0 manually


I setup Prometheus and blackbox exporter. Here are configs.

root@monitor-1:~# cat /etc/prometheus/prometheus.yml
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    scrape_interval: 5s
    static_configs:
      - targets:
        - http://wiki.itsmwork.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.20.202:9115


root@monitor-1:~# cat /etc/prometheus/blackbox.yaml | more
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      preferred_ip_protocol: "ip4"
      no_follow_redirects: false
      fail_if_ssl: false
      tls_config:
        insecure_skip_verify: true

I checked the http site manually, and it returned 0 which was expected.

root@monitor-1:~# curl "http://localhost:9115/probe?target=wiki.itsmwork.com&module=http_2xx" | grep -v '^#'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2013  100  2013    0     0   294k      0 --:--:-- --:--:-- --:--:--  327k
probe_dns_lookup_time_seconds 0.002698265
probe_duration_seconds 0.00308218
probe_failed_due_to_regex 0
probe_http_content_length 0
probe_http_duration_seconds{phase="connect"} 0
probe_http_duration_seconds{phase="processing"} 0
probe_http_duration_seconds{phase="resolve"} 0
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0
probe_http_redirects 0
probe_http_ssl 0
probe_http_status_code 0
probe_http_uncompressed_body_length 0
probe_http_version 0
probe_ip_addr_hash 0
probe_ip_protocol 0
probe_success 0

But if I checked the same target in Prometheus UI, up{instance="http://wiki.itsmwork.com",job="blackbox"} was always 1.

How can I determine what the problem is?


Solution

  • Be careful to not mix up the up and the probe_success when dealing with blackbox exporter. The first metric says that the exporter itself is reachable, the latter one is about the target the blackbox exporter ifself scrapes. So the combination you get is:

    This also matches your manual tests: the request to the blackbox_exporter instance (your curl command) works but results in a probe failure (as seen in the payload). So, for your dashboards, you should always combine the up metric with the probe_success if you want to reason about the system that is probed as there could also be the scenario that your system to be monitored is running properly but the blackbox exporter job isn't. You would be able to spot this using the up metric switching to 0.