Assume I have a list of losses plotted in the following KDE plot:
If the goal is to spot the outliers, the best threshold would be clearly around 0.75
, where the value of density is the minimum possible (near zero), and it is at the beginning of the tail.
Given a list of loss values, how can I (as accurately as possible) set such a threshold at the beginning of the tail?
I would suggest to sort
the list of loss values.
Then it depends upon the number of values present.
Suppose there are 100
values in the list and you need to find the threshold up to 95%
accuracy, you can decide the threshold value after sorting the list and see beyond which point the last 5 values fall.