pythonscikit-learndistributionnormal-distributionmultivariate-testing

Distribution of Isotropic Gaussian Blobs generated by sklearn.datasets.make_blobs()?


Could someone explain the meaning of isotropic gaussian blobs which are generated by sklearn.datasets.make_blobs(). I am not getting its meaning and only found this Generate isotropic Gaussian blobs for clustering on sklearn documentation. Also I have gone through this question.

So,heres my doubt

from sklearn.datasets import make_blobs
# data set generate
X, y = make_blobs(n_samples = 100000, n_features = 2, centers = 2, random_state = 2, cluster_std = 1.5)

# scatter plot of blobs
plt.scatter(X[:, 0], X[:, 1], c = y, s = 50, cmap = 'RdBu')

enter image description here

# distribution of first feature
sns.histplot(x = X[:, 0], kde = True) 

As the the distribution followed by this feature is approximately Normal. enter image description here

# distribuution of second feature
sns.histplot(x = X[ :, 1], kde = True, color = "green", alpha = 0.2 )

The distribution of the second feature is Bimodal which is not normal. enter image description here

# overall distribution of values
sns.histplot(x = X.flatten(), color = "red", kde = True, alpha = .5)

Which is also not normal!

enter image description here

# Variance Covrariance Matrix of Features
np.cov(X[:, 0], X[:, 1])

Output

array([[ 3.55546911,  4.70526192],
       [ 4.70526192, 19.00023664]])

What does it actually mean by Gaussian here!. It might be a silly question so appologies in advance.


Solution

  • I am sharing the things in the nutshell. enter image description here

    The code snippet for understanding the make_blobs() is here. make_blobs_notebook