pythonmachine-learningscikit-learncluster-analysisdata-mining

sklearn: Get Distance from Point to Nearest Cluster


I'm using clustering algorithms like DBSCAN.

It returns a 'cluster' called -1 which are points that are not part of any cluster. For these points I want to determine the distance from it to the nearest cluster to get something like a metric for how abnormal this point is. Is this possible? Or are there any alternatives for this kind of metric?


Solution

  • The answer will depend on the linkage strategy you choose. I'll give the example of single linkage.

    First, you can construct the distance matrix of your data.

    from sklearn.metrics.pairwise import pairwise_distances
    dist_matrix = pairwise_distances(X)
    

    Then, you'll extract the nearest cluster:

    for point in unclustered_points:
        distances = []
        for cluster in clusters:
            distance = dist_matrix[point, cluster].min()  # Single linkage
            distances.append(distance)
        print("The cluster for {} is {}".format(point, cluster)
    

    EDIT: This works, but it's O(n^2) as noted by Anony-Mousse. Considering core points is a better idea because it cuts down on your work. In addition, it is somewhat similar to centroid linkage.