elasticsearchloggingbosun

Return 0 if results aren't found from elasticsearch query


I'm running bosun to alert against an elasticsearch data set.

The scenario is that there's a number of cron jobs that do various things. If these execute successfully, they will log a success message. If they die / fail to run for whatever reason and fail to log the success message, we need to know about it.

My question is how to get a 0 result if no record is found, rather than null. Here's the basic query:

nv(sum(escount(esls("logs"), "context.taskname", esand(esgte("context.elapsed_time", 0), esor(esquery("context.taskname", "Task1 or Task2 or Task3 or Task4"))), "360m", "360m", "")), 0)

If a given task has run in the interval specified, the query should return a non-zero value for the number of success messages the task has logged.

This works, but I want the alert to fire ONLY if the task hasn't run. The problem is that if Task1 hasn't run and logged a completion message, it's just dropped from the final grouping rather than returning a 0 count.

Is there a way to ensure that each task in the esor returns something, even if it's a zero value?


Solution

  • In your situation there are 3 aspects to monitor:

    1. Have the jobs run
    2. Did the jobs run with a successful result
    3. Did the jobs run with a unsuccessful result

    Elastic doesn't matter in this case, so I have simulated the responses with the series function:

    alert zero_example {
        # success log messages
        $successful = sum(merge(series("job=task1", 0, 1), series("job=task2", 0, 1)))
        # error log messages
        $error = sum(merge(series("job=task1", 0, 0), series("job=task3", 0, 1)))
    
        # warn if no successful message or there is a non-zero number of error messages.
        # nv makes it so if there are no error messages, it will be treated as zero
        warn = nv($successful == 0, 0) || nv($error != 0, 0)
    
        # the final case is that a job hasn't logged. As long as the alert saw it in the 
        # first place, then Bosun will treat it as "unknown" when the result set disappears
        # from the result
    }