[SOLVED] how to get the confidence of clustering created by dbscan in python

how to get the confidence of clustering created by dbscan in python

I used the sklearn.dbscan in python and the result only gives the labels of each cluster, but I also want to calculate the confidence of clustering, or just the cluster's average distance of each other.

Do you guys have any idea?

Solution

I don't think this functionality is not supported by Scikit. Cluster confidence is not a thing, as DBSCAN does not use cluster probabilities. However, calculating cluster distances is relatively straightforward though.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.cluster import dbscan


# Get data & labels
data = load_iris()['data']
labels = dbscan(data)[1]

import numpy as np
from sklearn.datasets import load_iris
from sklearn.cluster import dbscan


# Get data & labels
data = load_iris()['data']
labels = dbscan(data)[1]

# Initialize results
cluster_means = np.zeros((len(set(labels)) - 1, data.shape[1]))
cluster_distances = np.zeros((len(data), len(set(labels)) - 1))

# Loop through clusters
for i, cluster in enumerate(set(labels)):
    # Skip noise
    if cluster == -1:
        continue

    # Get cluster mean
    cluster_mean = np.mean(data[labels == cluster], axis=0)

    # Set cluster mean
    cluster_means[i, :] = cluster_mean

    # Set cluster distances
    cluster_distances[:, i] = np.linalg.norm(data - cluster_mean, axis=1)