pythonpytorchruntime-errortensorbert-language-model

RuntimeError: stack expects each tensor to be equal size, but got [7, 768] at entry 0 and [8, 768] at entry 1


When running this code:

embedding_matrix = torch.stack(embeddings)

I got this error:

RuntimeError: stack expects each tensor to be equal size, but got [7, 768] at entry 0 and [8, 768] at entry 1

I'm trying to get embedding using BERT via:

split_sent = sent.split()
tokens_embedding = []
j = 0
for full_token in split_sent:
    curr_token = ''
    x = 0
    for i,_ in enumerate(tokenized_sent[1:]): 
        token = tokenized_sent[i+j]
        piece_embedding = bert_embedding[i+j]
        if token == full_token and curr_token == '' :
            tokens_embedding.append(piece_embedding)
            j += 1
            break                                     
sent_embedding = torch.stack(tokens_embedding)
embeddings.append(sent_embedding)
embedding_matrix = torch.stack(embeddings)

Does anyone know how I can fix this?


Solution

  • As per PyTorch Docs about torch.stack() function, it needs the input tensors in the same shape to stack. I don't know how will you be using the embedding_matrix but either you can add padding to your tensors (which will be a list of zeros at the end till a certain user-defined length and is recommended if you will train with this stacked tensor, refer this tutorial) to make them equidimensional or you can simply use something like torch.cat(data,dim=0).