azure-data-explorerkqloutliersanomaly-detectionazure-anomaly-detector

How does Kusto series_outliers() calculate anomaly scores?


Can someone please explain how the series_outliers() Kusto function calculates the anomaly scores? I understand that it uses Tukey fences with a min percentile and max percentile given a numeric array, but I would like to know in more details what are the steps/algorithm.

For example, given this table

let T = datatable(val:real)
[
   -3, 2.4, 15, 3.9, 5, 6, 4.5, 5.2, 3, 4, 5, 16, 7, 5, 5, 4
]

I found Q1 = 2.4, Q3 = 15, and IQR = 12.6 with a 10%/90% quantile range. So how did it derive these anomaly scores? [-1.9040785483608571, -0.10021466044004519, 1.3361954725339347, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.6702443406674186, 0.0, 0.0, 0.0, 0.0]


Solution

  • In that function the 10th and 90th are calculated with linear interpolation, so p10=2.7, p90=11 so IQR=8.3.
    In addition, we normalize the score to get a score that is similar to standard Tukey's test (that uses 25th and 75th percentiles), regardless of the specific percentiles we used for calculating the IQR.
    The normalization is done by assuming normal distribution and looking at score k=1.5 (that is the common threshold for mild anomalies) when using p25 and p75. So, when using p10, p90 to normalize the score we need to multiply it by 2.772 to make sure that we get k=1.5.
    Let's see how it works for -3.0, the first point in your sample data. k=(-3-2.7)/(11-2.7)*2.772=-1.904.
    I hope it's clear now.