pythonscikit-learntsne

The sklearn.manifold.TSNE gives different results for same input vectors


I give TSNE a list of vectors, some of these vectors are exactly the same. But the output of fit() function can be different for each! IS this expected behavior? How can i assure each input vector will be mapped to same output vector?

Exclamation, I cannot tell for sure, but I even noticed that the first entry in the input list of vectors always gets different unexpected value.

Consider the following simple example.

Notice that the first three vectors are the same, the random_state is static but the first three 2D vectors in the output can be different from each others.

from sklearn import manifold
import numpy as np

X= np.array([ [2, 1, 3, 5],
                    [2, 1, 3, 5],
                    [2, 1, 3, 5],
                    [2, 1, 3, 5],
                    [12, 1, 3, 5],
                    [87, 22, 3, 5],
                    [3, 23, 9, 5],
                    [43, 87, 3, 5],
                    [121, 65, 3, 5]])

m = manifold.TSNE(
    n_components=2,
    perplexity=0.666666,
    verbose=0,
    random_state=42,
    angle=.99,
    init='pca',
    metric='cosine',
    n_iter=1000)

X_emedded = m.fit_transform(X)

# The following might fail
assert( sum(X_emedded[1] - X_emedded[2] ) == 0)
assert( sum(X_emedded[0] - X_emedded[1] ) == 0)

Update.... sklearn.version is '1.2.0'


Solution

  • t-SNE, as presenter by van der Maaten and Hinton 2008 is a technique to "visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map".

    There is no guarantee that two identical points are mapped to the same low dimensional point. As a matter of fact it almost never happens as one can see with Algorithm 1 in (Maaten and Hinton 2008). The points in the low dimensional space are obtained with a gradient descent minimizing a cost function after a random initialisation.