pythonpython-3.xanomaly-detection

Finding a clever way to set a threshold given a list of loss values


Assume I have a list of losses plotted in the following KDE plot:

enter image description here

If the goal is to spot the outliers, the best threshold would be clearly around 0.75, where the value of density is the minimum possible (near zero), and it is at the beginning of the tail.

Given a list of loss values, how can I (as accurately as possible) set such a threshold at the beginning of the tail?


Solution

  • I would suggest to sort the list of loss values.

    Then it depends upon the number of values present.

    Suppose there are 100 values in the list and you need to find the threshold up to 95% accuracy, you can decide the threshold value after sorting the list and see beyond which point the last 5 values fall.