pythonmatplotlibgroupingcluster-analysisk-means

matplotlib detect and isolate in circles different groups of points


I would like to automatically detect and isolate the different groups of points placed on a graph. I drew 3 groups, I would like to detect and isolate each of them in a circle.

This is what I currently have :

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from scipy.spatial import ConvexHull



########## GENERATION DES POINTS ##########

# Générer 1000 points aléatoires répartis en trois groupes
n_points = 1000

# Groupe 1
mean1 = [0.2, 0.5]
cov1 = [[0.01, 0], [0, 0.01]]
x1, y1 = np.random.multivariate_normal(mean1, cov1, n_points // 3).T

# Groupe 2
mean2 = [0.7, 0.3]
cov2 = [[0.01, 0], [0, 0.01]]
x2, y2 = np.random.multivariate_normal(mean2, cov2, n_points // 3).T

# Groupe 3
mean3 = [0.5, 0.7]
cov3 = [[0.01, 0], [0, 0.01]]
x3, y3 = np.random.multivariate_normal(mean3, cov3, n_points // 3).T

# Tracer les points
plt.scatter(x1, y1, label='Groupe 1', color='purple', alpha=0.5)
plt.scatter(x2, y2, label='Groupe 2', color='purple', alpha=0.5)
plt.scatter(x3, y3, label='Groupe 3', color='purple', alpha=0.5)

plt.xlabel('X')
plt.ylabel('Y')
plt.title('Tracé de 1000 points répartis en groupes légèrement espacés')
plt.legend()

########## REPERAGE ET ISOLEMENT DES GROUPES ##########

# Appliquer K-means pour regrouper les points en 3 clusters
kmeans = KMeans(n_clusters=3)
points = np.column_stack((x1, y1, x2, y2, x3, y3))
kmeans.fit(points)

# Obtenir les centres des clusters
cluster_centers = kmeans.cluster_centers_

# Calculer le CME pour chaque groupe
cme_radii = []
for center in cluster_centers:
    distances = np.linalg.norm(points - center, axis=1)
    cme_radius = np.max(distances)
    cme_radii.append(cme_radius)

# Tracer les centres des clusters
plt.scatter(cluster_centers[:, 0], cluster_centers[:, 1], marker='o', s=200, color='red', label='Centres des clusters')

# Tracer les cercles autour de chaque groupe
for i, center in enumerate(cluster_centers):
    circle = plt.Circle(center, cme_radii[i], color='blue', fill=False, linestyle='--')
    plt.gca().add_patch(circle)

##########

plt.show()

enter image description here

The problem is that the groups are not correctly detected, as we can see in the picture.

Furthermore, the 3 groups in this example are easily visible, but in situations where I have 2 or more groups, how do I detect the number of clusters?

Lastly, do note that having no cluster is a possibility, in such a case, all points are close enough to form only one group (meaning that it is not necessary to trace a circle to show anything).


Solution

  • That's because you need to compute the CME using distances from each cluster's center to the points belonging to that specific cluster, not to all points. Also, the array (that will fit) should be a long (N, 2) instead of a wide (N, 6).

    points = np.column_stack((x1, y1, x2, y2, x3, y3))
    points = np.column_stack((x1, y1, x2, y2, x3, y3)).reshape(-1, 2)
    
    kmeans.fit(points)
    cluster_centers = kmeans.cluster_centers_
    labels = kmeans.labels_ # add this one
    
    cme_radii = []
    for center in cluster_centers:
    for i, center in enumerate(cluster_centers):
        distances = np.linalg.norm(points - center, axis=1)
        distances = np.linalg.norm(points[labels == i] - center, axis=1)
        cme_radius = np.max(distances)
        cme_radii.append(cme_radius)

    enter image description here