prometheusgrafanaprometheus-blackbox-exportersre

PromQL queries to for SLI(Service Level Indicator) indicators using prometheus/grafana and blackbox exporter


i want to achieve the specified SLI(Service Level Indicator) for our http endpoints using blackbox exporter for probing like the following indicators: 80% availability Latency less than 1s

For latency i figured i can use the query probe_http_duration_seconds > 1 but for availability i am not sure i am doing it correctly with quantile_over_time(0.80, probe_http_status_code)[1d] > 400. The condition greater than 400 is used to check for http errors because i assume the http status code above 400 is an error. Is this correct for my case if not please guide me. Thanks


Solution

  • If you want to calculate ratio of successful probes to number of all registered probes:

    count_over_time((probe_http_status_code<400)[1d:])/count_over_time(probe_http_status_code[1d:])
    

    If you want to find ratio of successful probes to number of all possible probes (assuming that some probes were not executed, for example if blackbox_exporter was down):

    count_over_time((probe_http_status_code<400)[1d:])/1440
    

    where 1440 is number of possible porbes within specified time range (1440 is a result of 1d / 1m, assuming scrape_interval is 1 minute, change according to your setup).