scipypytorchtensortsne

Plot Pytorch vectors with TSNE


I am using the ESM-1b model to train it with some protein sequences. I already have the vectors and now I wanted to plot them using TSNE. However, when I try to pass the vectors to the TSNE model I get:

'list' object has no attribute 'shape'`

How should I plot the Pytorch vectors (they are Pytorch tensors, actually)?

The code I have so far:

sequence_representations = []
for i, (_, seq) in enumerate(new_list):
   sequence_representations.append(token_representations[i, 1 : len(seq) + 1].mean(0))

This is an example of the Pytorch tensors I have (sequence_representations):

[tensor([-0.0054,  0.1090, -0.0046,  ...,  0.0465,  0.0426, -0.0675]),
 tensor([-0.0025,  0.0228, -0.0521,  ..., -0.0611,  0.1010, -0.0103]),
 tensor([ 0.1168, -0.0189, -0.0121,  ..., -0.0388,  0.0586, -0.0285]),......

TSNE:

X_embedded = TSNE(n_components=2, learning_rate='auto', init='random').fit_transform(sequence_representations) #Where I get the error

Solution

  • Assuming you are using scipy's TSNE, you'll need sequence_representations to be

    ndarray of shape (n_samples, n_features)

    Right now have a list of pytorch tensors.

    To convert sequence_representations to a numpy ndarray you'll need:

    seq_np = torch.stack(sequence_representations)  # from list of 1d tensors to a 2d tensor
    seq_np = seq_np.numpy()  # convert to numpy