machine-learningunsupervised-learninganomaly-detectionisolation-forest

How to know which features causes anomalies while training isolation forest model


I'm training an unsupersived isolation forest model with a dataframe that contains 10 features , the model performs well and detect anomalies. My question is if an anomaly is catched i want to know which feature(s) has caused that anomaly. Is there any way to do it ? If not , is there an other model that allows me to do it


Solution

  • SHAP values and the shap library can be used for this. See this answer for an example.

    After getting the shap values out of the explainer for your datapoints, you can use the waterfall plots to see how different features contributed to the decision.

    shap.plots.waterfall(shap_values[0])
    

    It will give a plot similar to this:

    enter image description here