python seaborn cluster-analysis dbscan tsne

Calculating the cluster size in t-SNE

I've been working on t-SNE of my data using DBSCAN. I then assign the obtained values to the original dataframe and then plot it with seaborn scatterplot. This is the code:

from sklearn.manifold import TSNE

tsne_em = TSNE(n_components=3, perplexity=50.0, n_iter=1000, verbose=1).fit_transform(df_tsne)

from bioinfokit.visuz import cluster
cluster.tsneplot(score=tsne_em)

from sklearn.cluster import DBSCAN
get_clusters = DBSCAN(eps=4, min_samples=10).fit_predict(tsne_em)

filter_df['x'] = tsne_em[:,0]
filter_df['y'] = tsne_em[:,1]

g = sns.scatterplot(x='x', y='y', hue = 'Species', style = 'Gender', data=filter_df)
g.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.savefig('Seaborn-MF-Species-TSNE-EPS4.png', dpi=600, bbox_inches='tight')

This is how the image appears:

I have seen that people calculate the size of the cluster (number of cells, percentages, etc) and do other post-analysis stuff for which i haven't found any type of code. Does anybody now how i can for example circle the exact clusters, show the number of cells in them etc...I have several of these graphs and it would really help me to make the results in them look more understandable.

Solution

If it is the cluster size, you just need to tabulate the results of your DBSCAN, for example in this dataset:

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
from sklearn.manifold import TSNE
import seaborn as sns

X,y = make_blobs(n_samples = 200,centers=3, n_features= 5, random_state=99)

tsne_em = TSNE(n_components=2, init='pca',learning_rate=1).fit_transform(X)
get_clusters = DBSCAN(eps=2, min_samples=5).fit_predict(X)

df = pd.DataFrame(tsne_em,columns=['tsne1','tsne2'])
df['dbscan'] = get_clusters
df['actual'] = y

We plot the clustering results from dbscan:

sns.scatterplot(x = "tsne1", y = "tsne2",hue = "dbscan",data=df)

The cluster size can be obtained:

df['dbscan'].value_counts()

 1    63
 2    63
 0    59
-1    15

Percentages:

df['dbscan'].value_counts(normalize=True)
 1    0.315
 2    0.315
 0    0.295
-1    0.075

Check with other labels, in this case I used the actual label, you can use your other annotations:

actual  0   1   2
dbscan          
   -1   4   8   3
    0   0   59  0
    1   0   0   63
    2   63  0   0