I used the sklearn.dbscan
in python and the result only gives the labels of each cluster, but I also want to calculate the confidence of clustering, or just the cluster's average distance of each other.
Do you guys have any idea?
I don't think this functionality is not supported by Scikit. Cluster confidence is not a thing, as DBSCAN does not use cluster probabilities. However, calculating cluster distances is relatively straightforward though.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.cluster import dbscan
# Get data & labels
data = load_iris()['data']
labels = dbscan(data)[1]
import numpy as np
from sklearn.datasets import load_iris
from sklearn.cluster import dbscan
# Get data & labels
data = load_iris()['data']
labels = dbscan(data)[1]
# Initialize results
cluster_means = np.zeros((len(set(labels)) - 1, data.shape[1]))
cluster_distances = np.zeros((len(data), len(set(labels)) - 1))
# Loop through clusters
for i, cluster in enumerate(set(labels)):
# Skip noise
if cluster == -1:
continue
# Get cluster mean
cluster_mean = np.mean(data[labels == cluster], axis=0)
# Set cluster mean
cluster_means[i, :] = cluster_mean
# Set cluster distances
cluster_distances[:, i] = np.linalg.norm(data - cluster_mean, axis=1)