graphneural-networkvisualizationword-embeddingtsne

Do I fit t-SNE the two sets of embeddings from two different models at the same time or do I fit each separately then visualize and compare?


I have two sets of embeddings from two different GNNs. I want to compare the embeddings by visualization and I want to know which way is the most appropriate way for comparison. Do I fit t-SNE separately for each set of embedding then use the scatter plot? Or do I fit them all into one t-SNE?


Solution

  • If you fit t-SNE separately for each embedding set, you’ll preserve the internal structure of each embedding space, as the optimization is independent. However, scatter plots from this method are not directly comparable because t-SNE outputs are non-deterministic and embedding spaces can be arbitrarily rotated, flipped, or scaled. While you can examine clusters in isolation, you won't gain insight into the relationship between the two embedding sets.

    If you fit t-SNE on both embedding sets together, the embeddings are projected into a shared low-dimensional space. This allows direct visual comparison, making it easier to observe whether embeddings from both GNNs cluster similarly. However, this approach risks distorting the internal structure of individual embedding spaces, especially if the embeddings differ significantly. The t-SNE algorithm will balance the relationships between and within the two sets, potentially introducing artifacts.

    The choice depends on your objective. For independent analysis of internal structures, fit t-SNE separately. For relational comparisons in a shared space, fit t-SNE jointly.

    To complement the visualization, you'd better include quantitative metrics like cosine similarity or cluster purity to reinforce your observations, as visualizations should be treated as exploratory tools rather than conclusive evidence.